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1.0  INTRODUCTION 


This  report  is  exclusively  concerned  with  the  theory,  development  and 
fabrication  of  digital  charge  coupled  devices  (DCCD's).  In  general  any 
digital  domain  function  can  be  performed  with  charge  coupled  devices.  This 
means,  in  particular,  digital  charge  coupled  device  logic  and  arithmetic 
functions.  The  development  of  DCCD  was  undertaken  with  the  goal  of  achieving 
both  the  traditional  advantages  associated  with  all  digital  circuits  as  well 
as  some  unique  to  the  CCD  technology.  Before  addressing  DCCD's  unique  charac¬ 
teristics,  a  review  of  the  basic  advantages  of  digital  logic  in  general  is 
provided  below  for  reference: 

(1)  Freedom  from  parameter  variations 

(2)  Freedom  from  environmental  variations 

(3)  Application  flexibility 

(4)  Easily  programmed 

(5)  Flexibility  in  selection  of  bit  accuracy 

(6)  Well  known  characteristics  that  are  easily  modeled  and  simulated 

(7)  Widespread  use  reduces  costs 

These  traditional  reasons  have  facilitated  the  acceptance  and  widespread 
use  of  digital  devices.  DCCD's  also  provide  some  unique  advantages.  These 
incl ude : 

(8)  Relatively  low  power  requirements 

(9)  Very  high  device/circuit  density  when  compared  to  various  bipolar 
(digital)  signal  processing  technologies 

A  review  of  the  unique  aspects  of  digital  CCD  technology  indicates  a 
variety  of  device  characteristics  that  are  othewise  unobtainable  with  other 
technologies;  low  power  is  clearly  desirable  for  applications  that  are  space 
or  man-pack  related.  The  high  functional  density  of  CCD's  can  be  used  in 
those  instances  where  a  large  number  of  arithmetic  calculations  are  required 
to  perform  an  overall  system  function.  DCCD  permits  the  circuit  designer 
to  place  a  large  number  of  arithmetic  functions  on  a  single  chip  thereby 
eliminating  interface  and  overhead  circuitry  and  significantly  reducing  such 
factors  as  chip  count,  power  supply  weight,  and  subsystem  volume. 
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1.1  DCCD  IMPLEMENTATION 


DCCD  technology  is  most  adaptable  to  binary  logic  applications;  each 
CCD  storage  position  represents  either  a  one  or  a  zero,  depending  up  whether 
or  not  it  contains  a  stored  charge.  CCD’s  can  therefore  be  used  to 
implement  Boolean  arithmetic  in  a  straight-forward  manner. 

The  catalog  of  CCD  devices  includes  half-adders  and  full-adders,  along 
with  logic  functions  such  as  AND's  and  OR's;  these  can  be  used  to  implement 
any  arbitrary  logic  or  arithmetic  function.  Because  the  charge  stored 
under  each  gate  is  shifted  at  each  clock  pulse,  all  functions  must  be  per¬ 
formed  in  pipeline  sequence,  rather  than  by  "ripple-through"  logic.  Pipe¬ 
line  calculations  in  arithmetic  units  are  required  due  to  the  generation  of 
the  "carry"  bit  at  each  stage.  For  example,  in  the  addition  of  two  N-bit 
words,  the  two  least  significant  bits  can  be  added  inrmediately  and  produce 
both  sum  and  carry  outputs.  The  carry  bit  is  then  combined  with  the  next 
significant  bit  and  produces  a  new  sum  and  carry  output.  The  carry  is  delayed 
during  each  operation,  requiring  the  delay  of  the  next  significant  bit  by  an 
equal  amount;  this  necessitates  the  use  of  delays  on  DCCD  device  input 
lines.  An  analogous  set  of  delays  is  also  inserted  in  series  with  the  output 
lines,  so  that  the  entire  output  word  is  available  during  one  subsequent  clock 
pulse.  These  delays  can  easily  be  provided  by  CCD  shift  registers  with  the 
attendent  penalty  that  delays  require  a  larger  active  area  for  DCCD  logic 
functions.  This  added  area  can  be  removed  in  large-scale  functions,  where 
skewed  arithmetic  is  permitted  (skewed  output  of  one  function  driving  the 
skewed  input  of  another  function). 

Employing  skewed  arithmetic  involves  synchronous  data  inputs  to  the 
chip  that  pass  through  a  set  of  delays  which  properly  skew  all  of  the  bits. 

Then  an  arithmetic  operation  (such  as  addition  or  multiplication)  is  per¬ 
formed  and  the  resultant  data  is  then  shifted  to  another  operation.  When  all 
arithmetic  operations  have  been  completed,  the  data  is  once  more  passed  through 
a  set  of  delays  that  again  synchronizes  all  bits,  to  make  them  simultaneously 
available  to  the  output  pins  during  a  single  clock  period. 
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In  general,  the  majority  of  the  delays  associated  with  arithmetic 
operations  can  be  deleted  for  functions  performed  internal  to  the  chip. 

Only  initial  skewing  delays  and  final  de-skewing  delays  are  required.  During 
arithmetic  operations  on  such  a  chip,  all  data  can  be  manipulated  in  skewed 
fashion.  There  are  other  implications  resulting  from  the  use  of  pipeline 
arithmetic;  since  data  inputs  at  one  clock  pulse  and  exits  during  a  subsequent 
clock  pulse,  random  calculations  cannot  be  handled  in  an  efficient  manner. 

DCCD  technology  is  therefore  best  suited  to  random  calculations  that  occur 
only  occasionally.  A  larger  number  of  algorithms  are  already  available  in 
pipeline  organization,  or  can  be  restructured  for  pipeline  use,  so  that  the 
application  of  DCCD's  is  not  significantly  restricted. 

It  is  worth  noting  that  pipeline  arithmetic  calculations  can  provide  a 
very  high  throughput  rate;  data  enters  the  device  at  the  maximum  clock  rate 
and  subsequently  exits  at  the  same  clock  rate.  The  designer  need  only  con¬ 
sider  the  series  delay  that  is  necessarily  a  part  of  pipeline  operation. 

1.2  PROGRAM  HISTORY 

In  1973,  the  Naval  Research  Laboratory  issued  a  Request  for  Quotation, 
for  a  study  program  aimed  at  defining  and  analyzing  the  impact  of  CCD 
technology  on  signal  processing  systems.  Implicit  in  such  a  statement,  was 
the  requirement  to  determine  those  areas  of  signal  processing  systems  where 
CCD's  offered  an  economic  advantage.  The  extent  of  that  advantage  or  eventual 
impact  could  then  be  projected.  This  projection  could  not  be  made  in  terms 
of  dollars  and  cents  alone;  it  was  to  be  made  on  the  basis  of  a  direct  com¬ 
parison  with  identical  functions  provided  by  competing  technologies.  Parameters 
such  as  speed,  power  requirements  and  parts  count  could  be  tabulated  for 
subsequent  comparisons. 

As  a  result  of  the  proposal  submitted  to  the  Naval  Research  Laboratory, 

TRW  initiated  a  study  of  the  impact  of  digital  CCD's  on  signal  processing 
systems.  The  results  have  been  issued  under  the  title,  "Charge  Coupled 
Devices  in  Signal  Processing  Systems;  Volume  I:  Digital  Signal  Processing."* 


♦Available  from  the  National  Technical  Information  Services,  a  companion 
report  titled,  "Charge  Coupled  Devices  in  Signal  Processing  Systems;  Volume 
II:  Analog  Signal  Processing"  is  also  available. 
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This  study  indicated  that  digital  CCD's  combine  the  inherent  advantages  of  any 
digital  technology  (such  as  high  noise  immunity,  freedom  from  device  parameter 
variations,  stable  operating  conditions  and  ease  of  simulation  with 
the  advantages  peculiar  to  CCD's  (such  as  high  density  and  low  power  require¬ 
ments).  In  addition,  digital  CCD's  are  best  suited  to  signal  processing 
applications  where  the  signal  flow  can  be  handled  by  pipeline  operations  that 
require  little  or  no  feedback;  this  permits  relatively  high  data  throughput, 
with  relatively  low  CCD  clock  frequencies.  The  impact  of  DCCD’s  is  most 
dramatic  in  those  applications  where  a  large  number  of  functions  and/or  high 
computational  accuracy  is  demanded.  A  large  number  of  these  potential 
applications  for  existing  and  projected  systems  were  identified  and  analyzed 
in  detail . 

At  the  conclusion  of  this  study,  TRW  recommended  experimental  verifica¬ 
tion  that  would  go  beyond  basic  device  operation  and  verification.  An 
opportunity  was  sought  to  demonstrate  the  real  advantages  of  the  digital 
CCD  approach.  The  design  of  a  digital  CCD  Fast  Forier  Transform  (FFT)  on  a 
chip  was  selected  as  a  useful  vehicle.  This  function  would  be  quite  flexible 
and  suited  to  a  number  of  diverse  applications.  Accordingly,  a  technology 
development  program  was  initiated. 

The  objective  of  the  first  phase  of  this  DCCD  Program  was  the  investiga¬ 
tion  and  characterization  of  fundamental  DCCD  building  blocks  that  would  be 
employed  in  a  typical  signal  processing  system  application.  The  results  of 
this  Phase  I  program  included  further  development  of  a  full  adder  circuit 
function,  the  design  and  test  of  a  4  +  4  adder,  and  a  3  x  3  multiplier  array. 
A  study  was  initiated  to  provide  a  method  of  interconnecting  a  number  of  FFT 
Charge  Coupled  Devices  in  Signal  Processing  Systems;  Volume  III:  Digital 
Function  Feasibility  Demonstration."* 

The  objective  of  the  second  phase  of  this  program  was  to  develop  large 
logic  building  blocks  suitable  for  implementing  a  FFT  or  similar  function. 

Near  the  end  of  Phase  II,  a  potential  application  in  voice  processing  arose 
that  would  require  16-bit  arithmetic  blocks,  i.e.,  a  16  x  16  multiplier  and  a 
32  +  32  adder/subtractor.  At  the  end  of  the  thirteen-month  Phase  II  effort. 


♦Available  from  the  National  Technical  Information  Services  Center. 
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work  was  completed  on  an  8-bit  arithmetic  block  design.  Work  on  larger 
blocks  was  continued  into  Phase  III,  including  the  design  of  a  32  +  32  adder/ 
subtractor. 


The  objective  of  the  third  phase,  described  in  this  report,  was  to 
complete  the  work  on  the  32  +  32  adder/subtractor;  then  to  work  on  the  16  x  16 
multiplier  and  smaller,  associated  circuitry.  As  the  work  on  the  32  +  32 
bit  adder/subtractor  layout  neared  completion,  two  factors  influenced  a 
change  in  program  objectives: 

1.  The  need  for  the  Phase  III  DCCD  circuits  was  eliminated  as  a 
result  of  a  change  in  the  voice  processor  program. 

2.  The  visability  provided  by  the  32  +  32  adder/subtractor  design 
and  layout  effort  indicated  that  the  cost  and  schedule 
associated  with  a  16  x  16  multiplier  was  beyond  the  remaining 
resources  of  the  Phase  III  program. 

At  that  time,  mutual  customer/contractor  interest  arose  in  a  unique  application 
area,  involving  manipulation  of  lists  of  digital  words  that  were  arranged 
in  order  oftheir  magnitude.  This  sort-and-merge-type  application  appeared 
well  suited  to  the  characteristics  of  DCCD  functions.  Various  algorithms  for 
performing  these  functions  were  examined  and  DCCD  layouts  were  investigated. 

A  specialized  LIFO  (Last  In  -  First  Out)  memory,  needed  to  implement  the  sort 
and  merge  function,  was  designed  and  layout  completed.  Due  to  program  cost 
and  schedule  limitations,  no  further  LIFO  activities  were  pursued. 


The  chronology  of  events  is  summarized  in  Figure  1-1. 


CALENDAR  YEAR 

1973 

1974 

1975 

1976 

1977. 

1978 

WON  DIGITAL  STUDY 

CONTRACT 

DIGITAL  STUDY  PHASE 

NAVY  REQUESTS  ANALOG  STUDY 

ANALOG  STUDY  PHASE 

PHASE  1 

(BUILDING  BLOCKS) 

PHASE  2 

(FUNCTIONS  COMPUTATIONAL) 

PHASE  3 

(DEMONSTRATION  UNIT) 

A 

A 

A 

* 

i 

AA 

L 

A 

ZJ 

■ 

Figure  1-1.  Chronology  of  Program. 
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1.3  PHASE  I  REPORT  SUMMARY 

This  report  contained  an  overview  of  the  entire  program  and  a  brief 
statement  of  goals  and  approaches.  This  was  followed  by  a  discussion  covering 
development  of  the  full-adder  circuit  function.  The  original  concept  was 
explained  and  subsequent  alternations  to  the  original  layout  was  described. 
Both  two  and  three  input  adders  were  discussed  and  hardware  implications  in 
the  several  computational  algorithms  available  were  examined.  The  primary 
test  mask  that  was  designed  during  Phase  I  was  presented,  along  with  a 
summary  of  the  test  results.  The  process  sequences  employed  to  produce  these 
devices  were  explained,  and  cross-sectional  views  of  the  devices  were  provided. 
This  was  followed  by  the  results  of  a  study  to  determine  methods  of  inter¬ 
connecting  a  number  of  the  projected  FFT  chips  into  a  single  system.  The 
report  concludes  with  recommendation  for  future  work. 

1.4  PHASE  II  REPORT  SUMMARY 

This  report  contained  a  commentary  on  the  advantages  of  digital  charge 
coupled  logic  (DCCL)  and  made  a  comparison  with  other  current  high  density/ 
lower  power  LSI  technologies.  A  description  of  the  basic  equations  necessary 
for  designing  DCCL  logic  gates  was  included  and  the  design  of  various  logic 
cells  and  arithmetic  functions  were  discussed.  The  principles  used  in  the 
design  of  the  pipelined  multiplier  and  adder/subtractor  arrays  were  discussed 
and  the  clocking  schemes  and  test  results  obtained  for  both  arithmetic 
arrays  and  single  arithmetic  functions  were  described.  The  metal/polysilicon 
and  double  polysilicon  fabrication  processes  used  to  fabricate  these  devices 
was  described.  The  report  concluded  with  recommendations  for  future  work. 

1.5  PHASE  III  REPORT  SUMMARY 

This  report  contains  a  history  of  the  complete  DCCL  program.  It  traces 
the  development  of  the  small  scale  building  blocks  (the  half  adder,  full 
adder  and  AND  and  OR  gates)  up  through  the  larger  scale  arithmetic  functions 
of  mul tipi ication,  addition,  and  subtraction.  This  report  also  deals,  in 
generic  terms,  with  the  signal  and  data  processing  applications  that  are 
suited  to  DCCL  technology.  The  programs  that  have  used  this  technology  to 
date  are  described.  This  is  followed  by  a  description  of  the  computer  aides 
and  fabrication  techniques  developed  specifically  for  this  technology.  This 
report  concludes  with  recommendations  for  future  work  and  a  list  of  patents 
and  publications  resulting  from  this  effort. 


1-6 


2.0  SUMMARY  AND  CONCLUSIONS 


2.1  SUMMARY 

The  development  of  digital  charge  coupled  device  (DCCD)  circuits  at  TRW 
has  evolved  through  a  number  of  identifiable  stages  since  its  inception  in 
1973.  The  primary  areas  of  design,  layout,  and  processing  have  all  experienced 
this  evolutionary  growth  in  significant  ways. 

2.1.1  Design 

The  design  realization  of  DCCD  logic  and  arithmetic  circuits  presented  a 
number  of  very  difficult  concept  and  modeling  problems  during  the  course  of 
the  program.  The  basic  adder  cells  emerged  as  both  the  most  difficult  and 
the  most  essential  circuits  for  performing  DCCD  logic  and  arithmetic  functions. 
Both  a  full  adder  and  a  half  adder  cell  were  developed  during  the  course  of 
this  program  and  are  described  in  detail  elsewhere  in  this  report.  Basically, 
the  full  adder  cell  possesses  the  advantages  of  higher  density  and  fewer  through¬ 
put  delays  than  the  dual  half  adder  while  the  half  adder  cell  is  faster  and  less 
sensitive  to  operating  biases  and  threshold  shifts.  The  half  adder  has  the 
additional  features  of  being  easily  configurable  into  other  essential  DCCD 
functions  such  as  charge  refresh  and  logical  "AND"  circuits.  The  overall  flexi¬ 
bility  and  performance  characteristics  of  the  half  adder  cell  resulted  in  its 
selection  for  the  majority  of  analysis  and  test  efforts  over  the  full  adder  cell. 
This  report  delineates  the  logical  development  of  these  adder  cells  from  a 
totally  isolated  (floating)  gate  design,  through  two  floating  gate  reset 
techniques,  leading  to  the  present  floating  diffusion  design.  Computer  models, 
which  were  developed  and  verified  during  this  program,  have  been  used  to  refine 
the  present  adder  design  into  a  stable,  predictable  configuration  exhibiting 
high  noise  immunity  and  speeds  of  5-10  MHz. 

2.1.2  Layout 

The  layout  of  DCCD  circuits  presented  a  number  of  unique  problems  not 
previously  encountered  by  other  digital  technologies.  Two  two  major  layout 
obstacles  were  the  lack  of  a  standardized  symbology  for  DCCD  circuits  and  the 
inability  to  directly  Interconnect  two  physically  separated  signal  points  by 
means  of  a  metal  conductor.  The  immense  difficulty  posed  by  the  lack  of 
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schematic  circuit  representation  can  be  easily  appreciated  by  everyone  in 
the  electronics  industry.  The  difficulty  posed  by  the  inability  to 
directly  connect  two  signal  points  together  with  metal  is  perhaps  less 
obvious.  A  DCCD  circuit  functions  by  transporting  discrete  charge 
packets  along  the  surface  of  the  S i / S i 0 2  interface  through  a  set  of 
sequentially  clocked,  overlapping,  MOS  gate  structures.  Consequently, 
moving  a  signal  from  one  physical  location  to  another  involves  interconnecting 
these  locations  through  a  series  of  gates.  This  method  of  interconnect  has 
several  major  disadvantages  from  a  layout  standpoint: 

(1)  Additional  refresh  cells  may  be  required  due  to  the  transfer  loss 
of  the  interconnecting  gates. 

(2)  The  delay  associated  with  the  interconnecting  gate  structure 
increases  the  overall  throughput  delay  and  may  require  additional 
delays  elsewhere  in  the  circuit  to  provide  synchronism  between 
interacting  charge  packets. 

(3)  Signal  path  crossovers  are  complex  and  impose  additional  timing 
constraints  on  the  design. 

During  the  performance  of  this  program,  major  progress  was  achieved  in 
overcoming  both  of  these  obstacles.  At  this  writing,  a  complete  system  of 
DCCD  schematic  symbols  exist  along  with  a  technique  for  assembling  these 
symbols  into  final  circuit  schematics  using  the  Applicon  Computer  Aided 
Design  (CAD)  system.  This  DCCD  schematic  capability  greatly  reduces  the 
opportunity  for  layout  errors  by  permitting  drafting  personnel  participation 
in  the  visual  check  cycle.  Prior  to  this,  only  the  design  engineer  could 
effectively  check  the  layouts. 

A  technique  for  interconnecting  physically  separated  points  directly 
(i.e.  without  gate  structure)  was  also  devised.  This  technique  has  been  termed 
the  "charge  transfer  node,"  and  is  described  in  more  detail  in  Section  4.4 
of  this  report.  Basically,  the  charge  transfer  node  consists  of  two 
diffusions  connected  by  a  metal  line.  Charge  is  transferred  onto  the 
diffusion  at  the  sending  end  of  the  node  causing  a  current  to  flow  (through 
the  interconnecting  line)  to  the  receiving  end  (diffusion)  of  the  node. 

This  current  flows  for  a  time  which  corresponds  to  the  sending  node  charge 
level.  A  storage  well  at  the  receiving  node  collects  this  charge  thereby 
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accomplishing  a  charge  packet  transfer  through  a  direct  metal  connection. 

The  completeness  (efficiency)  of  this  transfer  is  a  function  of  time 
allowed  and  the  capacitance  (including  interconnect)  of  the  node.  Results 
to  date  indicate  that,  for  frequencies  of  5-10  MHz,  total  node  capacitance 
can  be  on  the  order  of  100  ff.  This  technique  has  been  demonstrated  and  is 
in  use  on  another  major  DCCD  project. 

One  final  layout  obstacle  encountered  by  DCCD  (as  well  as  other  technolo¬ 
gies)  was  the  lack  of  computer  aided  design  rule  checking.  As  a  result  of 
the  complexity  presented  by  the  LSI  designs  being  undertaken  by  this  program, 
the  need  for  a  computer  aided  design  rule  check  was  soon  recognized.  Since 
the  inception  of  the  device  development  phase  of  this  program  in  1975, 
significant  progress  has  been  made  towards  this  end  and  work  continues.  For 
the  past  two  years,  a  computer  aided  design  rule  check  routine  written  for 
bipolar  circuits  has  been  utilized  to  perform  a  partial  check  of  the  DCCD 
layouts.  A  new  routine  which  will  provide  a  complete  design  rule  check  on 
DCCD  layouts  has  been  in  development  since  early  1979.  New  algorithms  have 
been  created  which  permit  both  intra-and  inter-level  checking.  Completion  of 
this  effort  is  anticipated  in  early  1980. 

2.1.3  Wafer  Processing 

In  1975,  when  the  DCCD  device  processing  activity  was  first  begun,  the 
existing  fabrication  technology  was  P-type,  surface  channel  employing  7.5 
micron,  metal/polysilicon  gate  structures.  Since  that  time,  a  continuum  of 
processing  improvements  have  been  implemented.  These  improvements  have  been 
directed  at  the  goals  of  increased  density,  higher  device  operating  frequencies, 
minimum  wafer  process  complexity,  and  the  achievement  of  reliable  and  repeat- 
able  process  sequences  capable  of  delivering  high  DCCD  functional  yields. 

In-as-much  as  the  history  of  DCCD  wafer  processing  is  complex  and 
intricate,  a  comprehensive  summary  of  these  activities  is  diffucult.  A 
listing  of  some  of  the  salient  changes  occuring  in  DCCD  processing  since 
1975  include: 
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(1)  Conversion  from  metal/poly  to  poly/poly  gate  structures 

(2)  Conversion  from  p-channel  to  n-channel 

(3)  A  general  reduction  in  lithography  from  7.5  micron  to  5  micron 

(4)  Incorporation  of  dry  etching  techniques  in  place  of  entirely 
wet  chemistry  processes 

(5)  Replacement  of  field  oxide  channel  definition  with  a  ion  implanta¬ 
tion  (channel  stop)  technique 

A  complete  chronological  history  of  the  DCCD  wafer  processing  evolution  is 
provided  in  Section  7.0  of  this  report. 

In  addition  to  the  continued  improvement  in  techniques  and  procedures, 
corresponding  improvements  in  both  lot  size  and  lot  fabrication  time  were 
achieved.  These  improvements  were  a  result  of  streamlined  procedures  and 
efficient  utilization  of  existing  resources.  Increased  lot  sizes  and  reduction 
in  fabrication  time  had  a  significant  impact  on  the  achievements  of  this 
project.  Since  circuit  development  is  iterative,  the  value  of  quick  turn 
around  and  the  opportunity  for  several  experimental  options  per  lot  cannot 
be  overstated. 

Efforts  to  further  improve  present  wafer  process  capabilities  are  still  in 
progress.  At  the  time  of  this  writing,  equipment  has  been  purchased  and 
facilities  changes  initiated  for  further  upgrading  and  automation  of  the  DCCD 
wafer  fabrication  laboratory.  These  improvements  will  provide  the  capability 
for  parallel  plate  plasma  etching;  automatic  photoresist  coating,  developing, 
and  scrubbing;  dual  flash  evaporation  metal ization ;  microprocessor  control  of 
critical  furnaces;  and  1  micron  resolution  proximity  alignment  with  the 
attendant  yield  improvements  associated  with  non-contact  printing.  These  items 
are  scheduled  for  installation  by  the  end  of  1979. 

2.2  CONCLUSIONS 

The  past  four  years  of  DCCD  development,  from  the  initial  building  block 
phase  through  the  final  demonstration  phase,  have  witnessed  a  continuous 
sequence  of  improvements  in  the  three  central  areas  of  device  design,  layout, 
and  processing.  In  retrospect,  the  absence  of  a  schematic  representation  for 
DCCD  circuits  and  the  lack  of  a  comprehensive  computer  design  rule  check 
capability  imposed  the  greatest  obstacles  to  the  timely  development  of  DCCD 
ci rcui ts. 
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At  this  point  in  time,  the  basic  elements  required  for  the  successful 
integration  of  some  of  the  larger  DCCD  functions,  conceived  during  the 
initial  study  phases  of  this  program,  are  available.  The  fundamental  logic 
and  arithmetic  DCCD  cell,  the  half  adder,  has  been  modeled  and  successfully 
demonstrated  good  performance  margins  at  frequencies  in  excess  of  5  MHz. 

A  system  of  layout  procedures  has  been  developed  around  the  specific 
problems  presented  by  DCCD  designs.  The  major  aspects  of  this  system  are 
a  comprehensive  set  of  design  and  layout  rules,  complete  schematic  documenta¬ 
tion,  the  ability  to  nest  large  functional  cells,  an  automatic  design  rule 
check  procedure,  and  a  DCCD/MOS  design  and  layout  manual.  With  this  system  it  is 
currently  possible  to  produce  LSI  chips  with  a  turn  around  time  of  two 
to  four  months  depending  on  the  complexity  of  design.  Finally,  wafer  pro¬ 
cessing  has  been  developed  for  DCCD  circuits  which,  with  present  design  rules, 
provides  repeatable,  functioning  circuits. 

The  present  design  status  of  DCCD  logic  cells,  exhibiting  good  noise 
margins  at  frequencies  of  5-10  MHz,  provides  an  opportunity  to  review  and 
update  power  and  density  estimates  for  this  technology.  A  comparison  of  DCCD 
with  other  technologies  was  performed  in  1  977^  using  the  full  adder  as 
the  basis  for  comparison.  These  DCCD  power  estimates  have  been  updated 
(based  upon  latest  design  configurations)  and  presented  in  Figure  2-1. 

Although  an  update  of  the  power  estimates  for  the  other  technologies  was  not 
possible  within  the  schedule  constraints  of  this  program,  it  should  be 
recalled  that  the  original  comparisons  were  for  5  micron  designs.  Figure  2-1 

shows  that,  although  DCCD  power  requirements  have  increased,  it  still  com- 

2 

pares  favorably  with  5  micron  I  L  and  CMOS.  Current  industry  efforts  to  reduce 
NMOS  and  CMOS  designs  from  5  microns  to  1-2  microns,  if  successful,  should 
reduce  the  power  requirements  of  these  circuits  by  factors  of  10  to  20.  A  corres¬ 
ponding  scaling  of  DCCD's  would  be  expected  to  provide  similar  power  reductions. 
The  ease  of  and  degree  to  which  DCCD  circuits  can  be  scaled  is  a  task  suggested 
(in  Section  10.0  of  this  report),  for  future  program  efforts. 

Revised  area  figures  for  DCCD  arithmetic  funcrions  (also  presented 
originally  in  1977)  are  provided  in  Table  2-1.  These  areas  have,  with  the 
exception  of  the  16  x  16  multiplier,  been  obtained  from  actual  circuit 
layouts,  excluding  pads  and  borders.  The  16  x  16  multip'iier  area  has  been 
extrapolated  from  the  8x8  multiplier  area.  Again,  updated  area  estimates 
for  the  other  technologies  could  not  be  obtained  for  this  report  so  Table  2-1 
reflects  the  original  5  micron  designs. 
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DCCD  CASCADED  DUAL 
HALF-ADDERS  (1979) 

DCCD  THREE  INPUT 
FULL-ADDER  (1979) 


Figure  2-1.  Power  dissipation  versus  clock  frequency  for  full 
adders  constructed  from  various  semiconductor 
technologies . 


Table  2-1.  Estimates  for  the  active  area  in  square  millimeters 
of  various  arithmetic  technologies  constructed  from 
different  semiconductor  technologies. 


Technology 

16+16 

DCCD 

6.0 

P-MOS 

11.3 

N-MOS 

7.8 

CMOS 

16.5 

I2L 

14.9 

8x8 

16x16 

7.0 

38.5 

12.2 

67.7 

7.7 

44.2 

19.5 

104 

26.2 

137 

32+32 


20.5 

49.2 


L 

P 


(il 


34.7 

70.2 

64.9 


Table  2-1  Indicates  that  In  spite  of  a  factor  of  two  growth,  from  a 
density  standpoint,  DCCD  still  compares  favorably  with  other  5  micron 
technologies.  The  areas  presented  in  Table  2-1  assume  the  use  of  the  DCCD 
full  adder  cell.  If  these  full  adders  were  implemented  with  dual  half 
adders,  the  DCCD  area  values  will  increase  by  approximately  a  factor  of 
two.  Density  comparisons,  such  as  presented  by  Table  2-1,  will  require 
frequent  review  as  the  present  industry  drive  for  micron  and  submicron 
designs  progress. 
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3.0  SMALL  SCALE  INTEGRATION  (SSI)  DEVELOPMENT 


3.1  INTRODUCTION 

Digital  logic  functions  in  CCD's  can  be  implemented  with  the  same 
fabrication  techniques  as  used  in  standard  analog  CCD's.  A  logical  one  is 
simply  defined  as  a  charge  quantity  which  is  equal  to  the  capacity  of  a 
minimum  geometry  storage  electrode,  and  a  logical  zero  is  defined  as  an  empty 
storage  electrode. 

The  logical  OR  function  is  the  easiest  function  to  implement.  The 
logical  OR  function  is  shown  in  Figure  3-1. 


A 

B 


Figure  3-1.  DCCL  OR  Gate. 

When  a  logical  one  is  transferred  from  either  the  A  or  B  input  under 
a  common  storage  electrode  the  OR  function  occurs.  In  this  simple  OR  gate, 
the  common  storage  electrode  will  contain  a  charge  quantity  which  is  twice 
that  of  a  logical  one  when  both  A  and  B  are  ones.  This  condition  can  be 
corrected  by  providing  a  potential  barrier  and  charge  sink  for  the  excess 
charge  as  shown  in  Figure  3-2. 


Figure  3-2.  DCCL  OR  Gate  with  Correction  for  1  +  1  Logic. 

Realizing  that  the  charge  which  is  discarded  is  the  AND  function  of  A 
and  B,  it  is  a  natural  extension  of  the  basic  OR  gate  to  form  an  AND 
gate.  As  shown  in  Figure  3-3,  a  AND  function  is  implemented  by  saving  the 
charge  which  spills  over  the  barrier  electrode  and  sinking  the  OR  function 
on  an  alternate  clock  phase. 


Figure  3-3.  DCCL  AND  Gate. 
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The  AND  gate  may  be  altered  to  perform  the  exclusive-OR  function  as 
shown  in  Figure  3-4. 


Figure  3-4.  DCCL  Exclusive-OR  Gate. 

The  schematic  of  a  DCCL  exclusive-OR  gate  is  shown  in  Figure  3-4.  As 
described  above,  if  either  of  the  two  input  channels  transfers  a  binary  "1" 
into  the  0  storage  area,  it  will  transfer  across  t  and  s  to  produce  the 
function  AB  +  AB.  However,  when  both  input  channels  transfer  binary  "1" 
packets  into  the  D  storage  area  so  that  it  overflows  across  b-j  to  fill  M,  the 
resulting  change  in  surface  potential  under  M  induces  a  potential  change  on 
the  floating-gate  or  floating-diffusion.  The  potential  change  is  transferred  to 
the  s  end  of  the  floating-gate  causing  it  to  switch  from  a  transfer  level  to 
a  charge  barrier  level.  Now  when  t  is  switched  to  a  positive  voltage,  the 
charge  packet  transferring  from  D  will  be  retained  under  t  by  the  s  acting  as 
a  barrier.  During  the  next  clock  phase  when  c  is  switched  to  a  positive 
voltage,  the  packet  of  charges  held  in  t  will  transfer  out  under  c  to  produce 
the  function  C  =  AB  and  no  charges  will  transfer  out  of  sQ . 
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The  DCCL  exclusive-OR  and  the  half-adder  are  identical,  the Sq  output 
is  the  SUM  and  the  C  output  is  the  CARRY  to  the  next  level  as  shown  in 
Figure  3-5.  A  full-adder  can  be  implemented  by  increasing  the  input 
channels  to  three,  and  by  adding  the  b ^  and  1  areas  as  shown  in  Figure 
3-6.  The  SI  and  S2  are  connected  together  in  a  DCCL  OR  gate  to  form  the 
SUM  output. 

DCCL  is  a  dynamic  system,  the  D,  M,  and  I  storage  areas  must  be  completely 
emptied  of  charge  between  input  charges.  If  any  of  the  three  specific 
functions  are  not  required  for  further  signal  processing  the  charge  packet 
at  that  exit  port  must  terminate  a,t  a  s  i-nk  diode. 

It  is  a  simple  matter  to  modify  a'  ha  If -adder  DCCL  cell  so  that  it  will 
perform  the  digital  refresh  function.  All  that  is  required  is  that  a  diode 
fill  and  spill  gate  be  arranged  to  insert  a  binary  "1"  into  the  D  storage 
cell  on  each  clock  phase,  synchronous  with  the  signal  charge  packet  in  the 
other  channel . 


The  B  input  channel  is  designed  the  continuous  "1"  channel  and  channel 
A  the  input  signal  channel,  then  the  SUM  and  CARRY  outputs  becomes  S  =  A  and 
C  =  A.  Since  the  C  =  A  charge  packet  is  a  binary  "1"  only  when  D  overflows, 
it  must  be  a  completely  full  charge  packet  no  matter  what  the  quantity  of 
charges  in  the  original  A  input;  thus  A  is  refreshed. 

04 


Figure  3-5.  DCCL  Half-Adder. 
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TRUTH  TABLE 


OUT 


A  B  G 


0  0  0 


0  0  1 


0  1  0 


0  1  1 


1  0  0 


i  i  u  u  i 

111  11 


AB  +  AB  +  ABC 


AB  +  AB 


CARRY  OUT 


AB  +  AC  +  BC 


Figure  3-6.  DCCL  Full -Adder. 


The  following  paragraphs  provide  a  detailed  description  of  the  evolution 
and  development  of  the  basic  DCCD  building  blocks  which  form  the  nucleus  of 
subsequent  LSI  circuit  designs. 

3.2  FLOATING  GATE/FLOATING  DIFFUSION  DESIGNS  AND  RESULTS 

In  1972  M.  F.  Tompsett  publ  ishet/^a  description  of  a  CCD  charge 
regenerator  that  used  a  floating-diffusion  as  the  sensing-node  of  a  charge 
transfer  electrode.  From  the  first  full-adder  design  on  the  LSM-1  layout, 
made  in  early  1975,  TRW  used  a  floating-gate  as  the  charge  sensing  element. 
This  design  requires  a  FET  switch  for  resetting  the  floating  gate  to  a 
preset  transrer  level  at  the  beginning  of  each  cycle  and  a  gated  sink  diode 
to  remove  charge  from  under  the  floating-gate  at  the  end  of  each  cycle. 

The  floating-gate  was  used  as  a  sensing-node  until  1976.  In  July  1974, 
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T.  A.  Zimmerman  mentioned  the  floating  diffusion  as  an  alternative  in  a 
paper  containing  the  description^ of  a  digital  CCD  exclusive-OR  gate.  The 
floating  diffusion  approach  provided  a  significant  improvement  in  the 
sensitivity  of  the  charge  sensing  element.  Elimination  of  the  capacitance 
associated  with  the  floating-gate  and  gated  sink  diode  reduced  the  total 
parasitic  capacitance  loading  on  the  charge  transfer  switch  which  was  the 
primary  factor  in  increasing  its  sensitivity. 

In  September  1976,  at  the  start  of  the  DP3  design,  it  was  realized  that 
by  using  the  source  diffusion  of  the  FET  switch  as  the  sensing-node  the 
FET  would  remove  the  unwanted  charge  on  the  node  at  the  end  of  the  cycle 
while  simultaneously  resetting  it  to  the  preset  transfer  level.  Subsequent 
designs  utilized  the  floating  diffusion  approach  for  all  DCCD  logic  circuit 
charge  sensing  elements. 

3.3  FULL-ADDER  DESIGNS  AND  RESULTS 

A  full-adder  is  the  basic  arithmetic  function  used  in  many  parallel 
adders,  subtractors  and  multipliers.  It  has  three  inputs  and  two  outputs. 

All  three  inputs  have  the  same  binary  significance,  usually  two  inputs  are 
addendums  while  the  third  input  is  the  carry-bit  from  a  less  significant 
full-adder.  The  outputs  are  the  sum-bit  and  the  carry-bit  as  shown  in 
Table  3-1. 


Table  3-1.  Full-Adder  Truth  Table. 


Inputs 

Outputs 

A1 

B1 

Cl 

SI 

C2 

0 

0 

0 

0 

0 

0 

0 

1 

1 

0 

0 

1 

0 

1 

0 

0 

1 

1 

0 

1 

1 

0 

0 

1 

0 

1 

0 

1 

0 

1 

1 

1 

0 

0 

1 

1 

1 

1 

1 

1 

A  description  of  how  a  full-adder  performs  the  arithmetic  function  is 
given  in  Section  3.1. 

The  first  full-adder  design  was  implemented  in  March  1975  on  the  LSM-1 
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mask  set  using  design  concepts  published  eight  months  earlier.  A  schematic 
diagram  of  this  early  full -adder  is  shown  in  Figure  3-7. 

In  common  with  all  of  the  circuits  included  on  the  LSM-1  wafer,  a  metal/ 
polysilicon  gate  configuration  was  employed,  with  gate  oxides  of  1000/2000  A 
under  the  polygate  and  metal  gate  structures  respectively.  The  field  oxide 
was  15,000  A  thick. 

The  polysilicon  charge  transfer  electrode  was  precharged  to  create  a 
transfer  mode  surface  potential;  this  occurred  by  applying  a  voltage  pulse 
to  a  capacitor  formed  by  the  metal -oxide-polysi 1  icon  charge  transfer  electrode. 
Two  problems  prevented  this  full-adder  from  functioning  properly. 

The  first  problem  was  a  race  condition  that  occurs  when  two  or  more  of 
the  inputs  have  a  binary  value  of  "1"  (full  charge  packets);  in  this  mode, 
both  the  "D"  and  "M"  storage  areas  are  filled.  The  charge  packet  under  the 
master-side,  "M",  of  the  charge  transfer  electrode  causes  the  surface  potential 
of  the  slave-side,  "N",  to  switch  from  its  initial  transfer  state  to  a 
barrier  (blocking)  state.  One  deficiency  of  this  design  was  that  a  part  of  the 
charge  packet  that  should  have  filled  M  also  transferred  into  N.  Consequently, 
as  the  surface  potential  of  the  charge  transfer  electrode  changed  from  a 
transfer  state  to  a  barrier  state,  the  charges  under  N  were  forced  to 
either  the  sum  state  or  to  the  sink  diode. 

A  second  problem  existed  because  of  a  fixed  charge  that  is  produced  at 
the  polysil icon-Si02  interface  as  the  oxide  is  grown.  The  variability  of 
this  semiconductor  process-induced  charge  made  it  extremely  difficult  to 
accurately  preset  the  charge  transfer  electrode  to  the  required  transfer  surface 
potential . 

Both  problems  were  eliminated  in  the  full-adder  design  completed  in 
August  1975  for  the  DP-0  evaluation  wafer.  The  race  condition  was  eliminated 
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Figure  3-7.  Implementation  of  a  LSM-1  Full  Adder. 


by  adding  a  transfer  gate  betwen  the  "D"  input  storage  gate  and  the  slave- 
node,  "N",  of  the  charge  transfer  electrode.  The  initial  process-induced 
charge  on  the  charge  transfer  electrode  was  removed  by  connecting  the  source 
of  a  small  FET  to  the  charge  transfer  electrode;  also,  connecting  the  drain 
of  the  FET  to  the  substrate  and  switching  the  FET  "on"  to  remove  the 
charge  just  prior  to  inducing  the  required  potential  on  the  charge  transfer 
electrode  via  the  precharge  capacitance.  These  two  modifications  produced 
a  full-adder  that  performed  all  the  correct  arithmetic  functions,  however, 
the  yield  of  good  devides  was  low  due  to  metal  line  breakage  over  oxide- 
covered  polysilicon  steps.  With  the  first  successful  demonstration  of  a  CCD 
full-adder  in  November  1975,  the  decision  was  made  to  use  it  as  a  building 
block  in  the  3x3  multiplier  and  4+4  adder;  this  chip  was  designated  DPI. 
The  layout  of  the  full-adder  was  eventually  modified  to  provide  identical  cell 
that  would  fit  together  topologically  on  the  Applicon  System;  additional  gates 
were  added  so  that  the  full-adder  cells  would  interconnect  with  the  appropriate 
clock  phases. 

In  order  to  digitize  the  multiplier  and  adder  arrays,  the  sum  output 
channel  was  aligned  so  that  it  could  be  connected  with  one  of  the  three  inputs 
to  the  next  cell.  The  carry  output  channel  had  to  be  aligned  to  enable  it 
to  line  up  with  one  of  the  three  inputs  of  the  full -adder  cell  in  the  next 
more  significant  column  of  the  array. 

Two  conceptual  layout  changes  were  made  to  the  DPI  full -adder  to  correct 
anomalities  that  occurred  in  testing.  An  additional  storage  area  was  placed 
between  the  "I"  and  "S"  storage  areas  (see  Figure  3-7)  in  order  to  correct 
the  phasing  of  the  two  sum  packages  at  the  sum  output  "OR-gate."  The  SINK 
diode  output  port  was  moved  from  the  slave-node,  "N",  of  the  charge  transfer 
electrode  to  the  "D"  input  storage  area,  thus  enabling  the  slave-node  to  be 
correctly  used  only  as  a  barrier  or  transfer  gate. 

The  DPI  full-adder,  multiplier  and  adder  arrays  used  1000  &  polysilicon 
gate  oxides  and  2000  X  metal  gate  oxides  to  direct  charge  flow  in  a  typical 
2-phase  CCD  shift  register.  The  DPI  full-adder,  4+4  adder  and  3x3 
multiplier  were  all  successfully  demonstrated  at  clock  speeds  up  to  175  kHz 
in  February  1976. 
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3.3.1  Full  Adders  Implemented  from  Dual  Half  Adders 

Early  in  1976,  we  reviewed  the  capabilities  of  half-adders  and  full-adders 
and  concluded  that  the  half-adder  had  the  potential  for  higher  speeds  and 
higher  charge  transfer  electrode  sensitivity. 

A  full-adder  can  be  easily  implemented  from  two  cascaded  half-adders, 
two  one-bit  shift-registers  and  an  OR-gate  as  shown  in  Figure  3-8.  The 
validity  of  the  Figure  3-8  configuration  can  be  shown  through  the  following 
explanation.  A  half-adder  accepts  two  inputs  a  and  b,  and  produces  a  sum 
S  =  1,  if  either  input  is  1,  but  not  when  both  inputs  are  a  1.  The  carry 
C  =  1  if  both  inputs  are  1.  Hence  S  =  a  +  b  and  c  =  ab.  A  full-adder 
accepts  three  inputs  and  produces  a  sum  S  =  1  when  one  or  all  th^e  inputs 
are  1,  thus,  in  logical  terms  S  =  g  +  (a  +  b).  A  carry  C  -  1  is  :ed, 
when  two  or  three  inputs  are  1  * s ;  C  =  g  (a  +  b)  +  ab.  Hence,  a  full-adder 
can  be  realized  using  two  half-adders  plus  an  OR  gate  as  shown  in  3-8. 
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Figure  3-8.  A  Full-Adder  Logic  Cell  Implemented  with 
Dual  Cascaded  Half-Adders. 


Two  separate  8+8  adder  arrays  were  included  on  the  next  mask  set 
(designated  DP2).  One  adder  used  the  original  full-adder,  while  the  other 
used  cascased  half-adders. 

The  single  DP2  full-adder  was  similar  to  the  DPI,  with  the  exception 
that  the  additional  storage  cell  between  the  "I"  storage  and  the  "S",  OR-gate 
was  removed.  The  "S"  storage  area  was  used  for  storage  of  the  sum  charge 
packet  that  occurs  via  the  slave-node,  when  a  single  input  is  a  binary 
"1"  and  also  when  the  charge  packet  provided  by  the  master-node  is  such  that 
all  three  inputs  are  a  binary  "1". 
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Both  the  half-adders  and  the  full-adders  used  on  DP2  designs  were 
further  simplified  from  earlier  DPI  designs  by  eliminating  the  precharge 
capacitance.  The  DP2  design  used  a  FET  to  preset  the  charge  sense  elec¬ 
trode  to  the  transfer  state  as  well  as  removing  any  initial  charge  from  a 
preceeding  bit. 

A  significant  DCCD  technology  advance  was  made  on  the  DP2  series  by 
changing  from  a  metal -polysi 1  icon  gate  combination  to  a  double  polysilicon 
gate  structure.  However,  processing  difficulties  were  experienced  with  the 
slip  mask  fabrication  technique  then  in  use  and  few  DP2  working  devices  were 
available  for  testing. 

Changing  the  location  of  the  output  port  of  the  carry-out  charge  packet 
from  under  the  master-side  of  the  charge  sensing  electrode  to  the  "Q"  input 
storage  gate  generated  a  full  charge  packet  each  time  there  were  two  or 
more  binary  "1"  inputs  to  the  D  input  area.  An  automatic  charge  refresh 
capability  was  thereby  provided  with  each  half-adder  and  full-adder  of 
the  DP3  design. 

Another  major  improvement  included  on  the  DP3  full -adder  was  the  elimina¬ 
tion  of  the  floating-gate  capacitance  as  the  charge  sensing  element  in  favor 
of  the  floating-diffusion  design.  This  was  accomplished  by  replacing  the 
oxide  capacitance  with  the  depletion  capacitance  of  the  precharge  FET  source. 

Both  the  16  +  16  adder  and  the  8x8  multiplier  were  designed  from  full- 
adders  implemented  using  two  cascaded  half-adders  as  shown  in  Figure  3-8. 

Two  separate  full-adder  test  cells  were  included  on  the  DP3  design;  both  were 
of  the  cascaded  half-adder  type;  one  had  a  f loati ng -gate  as  the  charge 
sensing  node  and  the  other  had  a  floating-diffusion. 

Tests  were  performed  on  both  full -adders.  It  was  determined  that  the  full 
adder  designed  with  the  floating-gate  charge  sensing  node  had  less  transfer 
efficiency  than  the  full-adder  employing  floating-diffusion.  The  low  transfer 
efficiency  of  the  floating-gate  version  was  traced  to  connection  of  the  sink 
diode  gate  and  carry-out  gate  to  the  same  clock  line.  This  approach  was  used 
in  an  effort  to  reduce  the  number  of  clock  lines  required  to  operate  the  adder. 
The  floating-diffusion  designhas  no  sink  diode  since  the  precharge  FET  removes 
any  residual  charge.  This  feature  also  eliminates  the  possibility  of  the 
condition  described  previously  in  Section  3.3.  A  decision  was  made  at  that 
time  to  standardize  on  the  use  of  the  floating-diffusion  design. 
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A  cascaded  full-adder  comprised  of  two  floating-diffusion  charge  sensing 
elements  performed  arithmetic  operations  correctly  at  clock  speeds  up  to  2 
MHz. 

Additional  details  of  the  DPO  and  DPI  designs  and  test  results  are 
(41 

published'  y  in  Vol.  Ill  and  details  of  the  DP2  and  DP2  full-adders  are 
published^  in  Vol.  IV,  cf  this  series. 

3.3.2  Con vers  ion  to  N-Channel  Designs 

In  December  1977,  an  effort  was  made  to  achieve  higher  clock  speeds  by 
replacing  the  existing  two-phase,  P-channel  circuits  with  the  inherently 
faster  three-phase  n-channel  designs 

An  N-channel  evaluation  mask,  designated  NE-1  ,  was  designed.  It  con¬ 
tained  a  single  binary  level  full-adder  as  well  as  many  other  DCCD  evaluation 
devices.  This  full-adder  was  shown  to  perform  correct  arithmetic  functions  in 
February  1978.  However,  the  gain  of  the  charge  sense  electrode  was  ioo  low 
to  permit  cascading  of  full-adders  to  achieve  a  working  multiplier  or  adder 
array. 

Consequently,  the  full -adder  was  redesigned  for  higher  sense  electrode 
gain  and  incorporated  into  the  DP5  mask  design.  A  schematic  diagram  of  the 
DP5  full -adder  is  shown  in  Figure  3-9  and  a  photograph  of  the  processed  device 
is  shown  in  Figure  3-10. 

The  cascaded  dual  half-adder  approach  to  a  full  adder  function  has  several 
system  application  advantages  over  a  single  full-adder  (see  Section  3.4.3).  It 
was  therefore  decided  to  only  use  cascaded  half-adders  in  future  LSI  applica¬ 
tions.  Only  minimal  tests  were  performed  on  the  DP5  full -adder.  Sufficient 
tests  were  carried  out  to  show  that  it  performed  all  of  the  correct  arithmetic 
functions  at  clock  speeds  up  to  BOO  kHz  and  that  its  switching  margin  agreed 
with  that  predicted  by  the  model  (i.e.  acceptance  of  a  25!  full  charge  packet 
as  a  binary  "zero"  and  75!  of  a  full  charge  packet  as  a  binary  "one." 

The  last  full-adder  was  included  in  the  design  of  the  Azimuth  Correlator 
Program,  described  in  Section  9.3.  The  ACD  full-adder  is  of  the  cascaded 
dual  half-adder  type  and  was  used  in  a  10  +  10  adder.  A  schematic  of  the  ACD 
full-adder  is  shown  in  Figure  3-11;  it  was  shown  to  operate  correctly  at  clock 
speeds  up  to  1.25  MHz  (see  Section  9.3  for  results). 
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Figure  3-11.  Implementation  of  the  ACD-0  Full-Adder.  This  is  a 

4-Phase,  Double-Polysilicon,  N-Channel  Device  Formed 
from  2  Flalf-Adders,  2  1-Bit  Delays  and  an  OR-Gate. 

3.3.3  Computer  Mod e 1  of  a  Full-Adder 

Digital  components  such  as  full-adders,  exclusive-OR  gates,  logic 
inverters  have  as  a  single  common  element,  the  electrode  that  senses  and 
logically  controls  the  direction  of  charge  transfer.  It  is  essential  that 
this  charge  sense  electrode  operate  correctly  with  large  variations  in  size 
of  input  charge  packets.  During  the  design  phase  of  the  DP5  full-adder,  we 
formulated  mathematical  and  computer  models  of  the  charge  sense  electrode 
which  accurately  predicted  the  operational  margin  of  these  digital  CCD 
components.  Clock  voltages,  maximum  tolerable  temperature  and  operating 
frequency  were  also  derived  from  the  computer  model. 

Although  the  model  was  designed  specifically  for  a  single  full -adder, 
it  can  be  adapted  for  other  digital  CCD  logic  circuits.  The  computer  model  pre¬ 
dicted  the  switching  margins  of  the  DP5  full-adder  as  well  as  the  ACD  half-adder. 
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A  capacitor  model  of  the  charge  transfer  electrode  is  shown  in  Figure 
3-12.  The  Computer  model  was  used  to  calculate  the  depletion  capacitance, 

Cd,  the  gate  oxide  capacitance,  CQX,  and  the  total  parasitic  capacitance, 

0^.  The  basis  for  these  calculations  included  the  digitized  area  parameters 
and  semiconductor  processing  parameters  such  as  oxide  thickness,  bulk 
impurity  concentration,  and  fixed  oxide  charge  density  (Qss)- 

The  total  parasitic  capacitance  is  the  sum  of  many  small  parasitic  capa¬ 
citances,  such  as  the  adjacent  gate  overlap  capacitance,  the  metal  strap  to 
substrate  capacitance  and  the  slave  node  gate  to  channel  stop  overlap 
capacitance. 

The  computer  program  also  takes  into  account  the  expansion  or  shrinkage 
of  areas  due  to  over-exposure  or  etching  during  processing  in  accordance 
with  the  factors  listed  in  Table  3-2.  Calculations  can  be  based  on  maximum, 
nominal,  or  minimum  shrinkage  values.  An  RSS  program  is  available  that 
calculates  the  RSS  variation  of  every  surface  potential  based  on  the  fluctua¬ 
tions  of  Table  3-2. 

For  a  given  amount  of  charge  (aQ)  spilling  across  the  barrier  to  the  charge 
transfer  electrode,  the  change  in  voltage  at  the  charge  transfer  electrode  is 
defined  as: 


-  p 

9  l-F 


where  Cp  is  the  total  capacitance  on  the  charge  transfer  electrode: 


C  +  CA 

Cp  -  C„  ♦  s22 - 4 

F  p  Co x  Cd 


Cp  =  parasitic  capacitances  including  the  FET  source  depletion  capacitance 
CQx  =  oxide  capacitance  under  the  slave  node  of  the  charge  transfer  node 
Cd  =  depletion  capacitance  under  the  slave  node  of  the  charge  transfer  node 


Figure  3-12.  Capacitor  Model  of  the  Charge  Transfer  Electrode. 


Table  3-2.  Process  Shrink/Growth  Factors. 


The  change  i 


where 


Thus 


PARAMETER 

MIN 

N0M 

MAX 

X(l)  Poly  1  Shrinkage 

.024 

.029 

.034 

X{2)  Poly  2  Shrinkage 

.014 

.019 

.024 

X(3)  N+  Lateral  Diff. 

.026 

.03 

.034 

X(4)  P+  Lateral  Diff. 

.045 

.05 

.055 

X(5)  Metal  Shrinkage 

.025 

.03 

.035 

X(6)  N+  Overetch 

.004 

.0045 

.005 

X(7)  Threshold  Variation 

0 

.05 

.1 

X ( 8)  P+  Overetch 

.02 

_ 

.024 

.028 

n  charge  transfer  electrode  surface  potential  (A0g)  is  given  by: 
-  AV£  *  »„  -<21VJV0  *  V*)1'2 

=  «AVq 

aVG  =  AVG  '  VFB 

v  -  qNAEo  Hi9z 
V0  r2 
L0X 

.a  _  aAQ. 

A0s  * 
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Figure  3-13  shows  the  surface  potential  diagram  for  the  T  cell  -  Charge 
Transfer  electrode  system. 

Where: 

0^  =  The  preset  surface  potential  of  the  charge  transfer  electrode  as  set 
by  the  FET. 

0N  =  When  the  two  charge  packets  into  the  "D"  cell  are  a  full  binary 

"one"  ( Qq )  and  a  "fat  zero"  (QN).  or  two  "fat  zero"  charge 

packets  in  the  case  of  a  three  input  port  full-adder.  The  full 
binary  one  charge  packet  will  transfer  onto  the  "T"  gate  and  the  fat 
zero(s)  packet(s)  will  spill  across  the  barrier  to  the  charge  transfer 
electrode,  causing  the  surface  potential  to  change  to  0^. 

0y  =  When  the  two  charge  packets  into  the  "D"  cell  are  "thin"  (attenuated) 

binary  "ones"  (2Qy)  a  full  binary  "one"  charge,  (Q0)>  will  transfer  into 

the  "T"  gate  and  the  remainder  of  the  two  "thin"  packets,  (2Qy  -  Qq), 
transfers  across  the  barrier  to  the  charge  transfer  electrode  causing 
the  surface  potential  to  change  to  0y.  (2Qy  -  Qq)  =  Qj. 

0o  =  The  surface  potential  on  the  charge  transfer  electrode  when  a  full 
binary  "one"  charge  packet,  (Qq),  transfers  across  the  barrier  onto 
the  charge  transfer  electrode. 

0p  =  The  initial  surface  potential  of  an  empty  "T"  cell  as  set  by  the  most 
positive  voltage  level  of  the  clock  signal. 

0j  =  The  surface  potential  of  the  "T"  cell  when  a  full  binary  "one"  charge 
packet  (Q0).  is  transferred  into  it. 

It  can  be  seen  that  the  amount  of  blocked  charge  UQb)  is  expressed  as: 


AQb  =  CTU0y  -(0j  -  0^)) 

^T 

AQb  =  q—  aAQj  -  ACy 

aAQ , 

A  =  0j  -  0N  &  A$y  = 
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Where 


Figure  1-13.  Surface  Potential  Diagram  of  the  T-Cell  to 
Charge  Transfer  Electrode  Interface. 

The  desired  result  is:  to  provide  value  for  AQ  that  is  approximately  the  same 
as  (but  less  than)  Qq;  also  provide  a  value  for  AQb  =  0,  that  is  0j  -  0j  >  0; 
provide  a  value  for  AQ  that  is  nearly  the  same  (but  greater  than)  Qq;  and 
provide  a  value  for  aQb  =  0,  thereby  retaining  the  margin  -  0p  >  0. 

Mathematically,  Qb  <_  0,  (when  AQ  =  Q^)  when  the  charge  transfer  node  does 
not  present  a  barrier.  The  largest  swing  in  the  charge  transfer  node  (that 
allows  AQb  <_  0)  occurs  when  AQb  =  0.  The  amount  of  charge  on  the  charge 
transfer  electrode  required  to  do  this  is: 
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Mathematically,  AQb  >  Qq  (when  aQ  -  Qj),  when  the  charge  transfer  electrode 
presents  a  total  barrier.  The  smallest  swing  in  the  charge  transfer 
electrode  occurs  when  AQb  =  Qq.  The  amount  of  charge  on  the  charge  transfer 
node  required  to  do  this  is: 


AQ 


(Qo*ACT> 


AQ 


0£Cp+  ACp 
aCy  a 
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To  maximize  the  sw^t^hing  margin  of  the  "T"  cell  minus  the  charge  transfer 
electrode  system,  must  be  at  a  minimum. 

The  computer  program  calculates  the  size  of  the  Q1  full  packet  from  the 
area  of  the  "D"  cell,  the  maximum  clock  voltage  applied  to  the  D  cell  and 
the  potential  of  the  barrier  gate. 

The  computer  program  also  calculates  the  total  noise  charge  packet  (Q^), 
by  summing  three  separate  components:  the  first  of  these  noise  components 
is  the  residual  charge  that  remains  after  a  full  charge  packet  has  been 
attenuated.  The  second  component  is  the  charge  packet  that  crosses  the 
potential  barrier  of  a  previous  arithmetic  cell,  due  to  low  threshold  voltage. 
The  third  component  is  due  to  accumulation  of  dark  current. 

For  the  DP5  full -adder,  the  full  charge  packet  was  calculated  to  be  3.2 
million  electrons  and  the  noise  charge  packet  was  calculated  to  be  one  million 
electrons. 

The  computer  program  successively  adjusts  the  preset  surface  level  of 
the  charge  transfer  electrode  so  that  the  0^  -  0p  margin  equals  the  0j  -  0y 
margin.  The  computer  RSS  program  for  the  DP5  full-adder  calculated  both 
margins  as  213  mV. 


3.3.4  Advantages  of  Full -Adder  Over  Dual  Half-Adder  Implementations 

3. 3. 4.1  Power  Dissipation 

The  power  consumed  in  a  DCCL  is  only  that  power  required  to  charge  the 
gate  capacitance  to  each  clock  voltage  level.  The  capacitance  of  the  01  clock 
line  to  the  full-adder  is  approximately  1.8  times  that  of  a  half-adder;  with 
the  exception  of  that  clock  line,  all  other  capacitances  are  identical.  This 
additional  capacitance  causes  a  full-adder  to  dissipate  20%  more  power  than  a 
single  half-adder.  However,  when  dual  half-adders  are  used  to  implement  a 
full-adder  function,  this  configuration  requires  two  one-bit  shift-registers 
and  an  OR  gate.  These  additional  elements,  added  to  the  two  half-adders,  result 
in  an  overall  power  dissipation  that  is  2.5  times  that  of  a  single  full-adder. 

3. 3.4.2  Transfer  Efficiency 

It  is  not  feasible  to  use  a  "fat-zero'1  in  implementing  arithmetic  functions 
with  DCCL's.  As  a  result,  transfer  efficiencies  of  only  0.998  per  transfer 
are  achievable.  There  are  two  transfers  through  a  full  adder,  resulting  in  a 
transfer  efficiency  of  0.996.  In  a  dual  half-adder  configuration  there  are 
four  transfers  producing  a  transfer  efficiency  of  0.992.  In  the  layout  of  a 
large  pipeline  arithmetic  array  it  will  therefore  be  necessary  to  insert  a 
level  of  charge  refresh  cells  twice  as  frequently  when  dual  half-adder  configura 
tions  are  used. 

3. 3. 4. 3  Pipeline  Delays 

Both  full-adders  and  half-adders  perform  a  single  mathematical  function 
in  one  clock  cycle.  Therefore,  a  full-adder  implemented  from  two  cascaded 
half-adders  requires  two  clock  cycles  to  perform  the  full -adder  function.  An 
adder  or  multiplier  designed  from  full-adders  will  impose  half  as  many  pipe¬ 
line  delays  on  the  system  as  the  same  adder  or  nultiplier  designed  from 
cascaded  dual  half  adders.  This  is  significant  where  maximum  circuit  density 
is  of  primary  importance. 

3.4  HALF-ADDER  DESIGNS  AND  RESULTS 

A  basic  description  of  the  half  adder  and  how  it  performs  arithmetic 
functions  has  been  previously  provided  in  Section  3.1.  For  reference,  the 
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The  DP-0  wafer  used  metal  gates  over  a  2000  A  gate  oxide,  a  single 
polysilicon  gate  level  over  1000  X  gate,  P-channel  technology,  and  a  15000  & 
field  oxide.  The  DP-0  half-adder  design  did  not  provide  useful  devices  due 
to  metal  line  breakage  resulting  from  inadequate  metal  step  coverage  layout 
constraints. 

The  layout  problems  of  the  DP0  half-adder  was  corrected  (to  eliminate  the 
cause  of  metal  breaks)  on  the  DPI  design.  The  half-adder  was  subsequently 
used  as  a  standard  cell  to  form  a  4  +  4  adder  and  a  3  x  3  (magnitude  only) 
mul ti pi ier . 

Analytical  results  provided  by  computer  modeling  of  the  full  and  half¬ 
adders  made  it  apparent  that  the  size  of  the  T-gate  should  be  2.8  to  3.0  times 
the  area  of  the  D-gate  in  orde^  to  enhance  the  functional  performance  of  the 
charge  sense  electrode.  It  also  became  apparent  that  the  area  under  the 
master  end,  M,  of  the  charge  sense  electrode  should  be  as  small  as  possible 
so  as  to  make  the  voltage  change  transmitted  to  the  charge  transfer  electrode 
as  large  as  possible. 

There  were  two  different  half-adder  layouts  used  on  the  DPI  mask.  The 
first  was  used  in  the  multiplier  and  adder  along  with  the  full-adder  and  was 
identical  in  design  to  the  full-adder  except  that  one  input  and  the  inter¬ 
mediate  gate,  I,  were  removed.  These  half-adder  and  full-adder  combinations 
all  had  the  large  T-gate,  small  M-gate  combinations  and  the  arrays  containing 
them  function  extremely  well. 

The  second  half-adder  was  a  completely  different  design  and  did  not 
utilize  a  large  T-gate.  This  half-adder  design  was  used  in  the  3x3 
and  4+4  arrays  that  used  the  half-adder  exclusively.  After  developing 
special  three  level  clocks,  these  half-adders  and  the  multiplier  and  adder 
arrays  containing  them  were  made  to  function. 

In  the  next  desiqn  (the  DP2  mask),  we  used  the  cascaded  half-adder  to 
form  a  full-adder  as  shown  previously  in  Figure  3-8.  The  half-adder  design 
was  simplified  by  eliminating  the  precharge  capacitance  and  using  the  FET 
to  preset  the  charge  transfer  electrode  to  the  transfer  surface  potential 
as  well  as  remove  any  initial  charges. 
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As  mentioned  previously  in  Section  3.3  (regarding  DP2  full  adders), 
difficulties  were  experienced  in  using  a  slip-mask  approach  and  the  develop¬ 
ment  of  the  DP3  designs  were  well  underway  before  we  received  any  DP2 
devices.  Consequently,  very  little  testing  was  carried  out  on  the  DP2 
hal f-adders . 

It  was  at  this  time  that  we  discovered  by  changing  the  location  of  the 
carry  output  from  the  master  side  of  the  charge  sensing  electrode  to  the  D 
input  storage  gate,  we  were  guaranteed  of  generating  a  full  charge  packet 
each  time  there  were  two  or  more  binary  "1"  inputs  (full-charge  packets) 
into  the  D  input  area.  Thus,  we  had  an  automatic  charge  refresh 
capability  at  the  carry-out  port  of  all  half-adders  on  the  DP3  and  subsequent 
desi gns . 

Another  major  improvement  on  the  DP-3  half-adder  was  to  change  from 
using  a  floating-gate  capacitance  at  the  master-side  of  the  charge  sense 
element  to  the  use  of  the  precharge  FET  source  as  a  floating  diffusion. 

Both  the  16+16  adder  and  the  8x8  multiplier  on  the  DP3  chip  were 
designed  from  cascaded  half-adders  and  OR  gates.  Tests  were  performed 
on  both  the  floating-gate  and  floating-diffusion  half-adders  and  it  was 
shown  that  the  floating-diffusion  half-adder  had  greater  sensitivity  and 
better  transfer  efficiency. 

In  December  1977,  in  response  to  a  desire  for  higher  clock  speeds,  we 
changed  from  the  two-phase,  p-channel  structure  to  a  three-phase  n-channel 
structure.  The  channel  was  defined  by  an  implanted  p+  channel  stop. 

The  first  N-channel  evaluation  mask  was  designated  NE-1  and  contained 
a  half-adder  as  a  test  device.  A  schematic  diagram  of  the  NE-1  half-adder 
is  shown  in  Figure  3-15.  This  half-adder  was  shown  to  function  correctly 
in  February  1978,  and  was  used  as  one  of  the  test  vehicles  for  the 
radiation  tests  (see  Section  5.0). 

During  the  radiation  testing  of  the  NE-1  half-adder  it  was  found  that 
the  surface  potential  shift  of  the  charge  sense  gate  did  not  track  well  with 
the  surface  potential  shift  of  the  other  CCD  gates  in  the  half  adder.  The 
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Figure  3-15.  Schematic  of  the  NE-1  Half-Adder. 

reason  for  this  is  that  with  the  drain  of  the  precharge  FET  connected  to 
its  gate,  there  are  two  threshold  voltage  drops  involved  with  the  charge 
sense  gate  as  compared  to  one  between  the  CCD  gates  and  their  surface 
potentials.  Thus  as  the  threshold  voltages  changed  with  radiation  exposure, 
the  surface  potential  did  not  track  between  the  charge  sense  gate  and  the 
other  half-adder  gates. 

This  was  corrected  in  the  NE-?  half-adder  and  the  exclusive  OR-gate 
used  on  the  DP5  arithmetic  array  by  connecting  the  FET  drain  and  gate 
electrodes  to  separate  pads.  The  trade-off  was  additional  silicon  area  and 
ar  additional  bonding  pad. 

One  problem  that  we  found  with  all  of  the  digital  logic  and  arithmetic 
cells  on  the  NE-1  design  was  the  incompatibility  between  the  even  number  of 
gates  in  a  double  polysilicon  structure  and  the  odd  number  of  clocks  in  a 
3-phase  clocking  system.  This  made  contiguous  interconnection  of  identical 
cells  very  difficult.  Our  first  approach  was  to  make  one  storage  area  from 
two  overlapping  polysilicon  gates.  This  seemed  a  good  approach  since  both 
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gates  have  the  same  oxide  thickness,  however  we  found  in  practice  that 
there  was  a  surface  potential  bump  under  the  first  gate  in  the  overlap 
structure.  This  bump  caused  some  charge  to  be  trapped,  resulting  in  an 
excessive  transfer  loss.  ^ 

On  the  NE-2  and  DP-5  designs,  instead  of  connecting  both  halves  of  the 
overlapped  gate  pair  to  the  same  clock  line,  they  were  taken  to  two  different 
clock  lines.  Both  clock  lines  had  the  same  phase  relationship  but  the  second 
gate  clock  was  larger,  producing  a  deeper  potential  well.  The  length  of  the 
first  gate  was  reduced  makinc  it  a  transfer  gate  and  the  length  of  the  second 
half  series  gate  was  increased  so  that  it  would  store  the  whole  charge. 

This  proved  very  satisfactory  and  the  half-adder  test  device  on  DP5  proved 
to  have  excellent  transfer  efficiency. 

The  next  half-adder  to  be  designed  was  for  the  ACD  project  (see  Section 
9.3).  There  were  two  new  requirements  for  this  device.  The  first  was  that 
it  should  operate  at  clock  speeds  up  to  3.5  MHz  and  second  it  should  operate 
with  the  same  4  phase  clocks  required  by  our  current  SPS  memory  designs. 

The  first  requirement  was  met  by  reducing  the  transfer  length  of  all 
the  gates,  and  taking  special  care  that  the  transfer  length  of  the  T-gate 
(the  largest  gate  in  the  half-adder)  was  kept  to  a  minimum.  The  even  number 
of  4  clock  phases  fell  naturally  in  line  with  the  even  number  of  two 
polysilicon  levels  so  that  the  ACD  half  adder  was  easier  to  implement  than 
the  NE1/DP5  3-phase  half-adders. 

In  parallel  with  the  design  of  the  ACD  half-adder  was  an  earlier  version 
designed  for  the  FHT  projects  (see  Section  9.2).  Although  this  half-adder  is 
similar  to  the  NE-1  design  in  that  it  is  also  N-channel ,  3-phase  clock,  and 
uses  a  floating-diffusion  for  charge  sensing;  it  had  one  significant 
difference.  The  slave-end  of  the  charge  transfer  electrode  is  of  second 
level  polysilicon,  whereas  the  NE1/DP5/ACD  half  adders  used  first  level 
polysilicon  for  the  slave-end.  The  result  of  this  was  that  the  effective 
length  of  the  channel  under  the  slave-end  of  the  charge  sense  electrode  was 
shorter  than  the  later  half-adder  designs  and  performance  was  degraded  by 
fringing-fields  from  adjacent  gates. 
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The  last  half-adder  designed  to  date  Is  the  ACD-2.  This  is  a  further 
evolution  of  the  NE1 -NE2-DP5-ACD  series.  The  newer  half-adder  has  a 
larger  storage  well  to  store  the  sum-bit  and  also  a  wider  barrier  electrode 
between  the  D-gate  and  the  master-end  of  the  charge  transfer  electrode. 

3.4.1  Half-Adder  Gain  and  Operating  Margins 

The  same  mathematical  and  computer  models  described  in  Section  3.3.3  for 
a  full-adder  can  be  modified  and  used  to  calculate  the  operating  margins 
of  the  half-adder.  The  only  modification  necessary  is  to  reduce  the  inputs 

from  three  to  two  and  to  remove  the  second  barrier  overlap  capacitance  from 
the  floating-diffusion  node  of  the  charge  transfer  electrode. 

We  define  the  gain  ( G ) ,  of  a  half-adder  (or  a  full -adder)  as 


m 

AQ 
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where  AQb  is  the  amount  of  charge  blocked  and  AQ  is  the  amount  of  charge 
transfered  to  the  floating  diffusion  node  of  the  charge  sense  electrode.  The 
gain  can  be  calculated  from  Equation  3.1  (see  Section  3.3.3  for  Equations  3.1 
through  3.3): 
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From  Equation  3.2  when  AQ  =  Q, 


G  - 


AC, 
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Qj  is  the  amount  of  charge  that  transfers  across  the  barrier  to  the  charge 
sense  electrode  when  two  thin  (attenuated)  binary  ones  (2Qy)  are  transfered 
into  the  0  cell  and  a  full  binary  one  charge,  (Qq)  is  retained  by  the 
barrier.  Qj  =  2QT  -  Qq. 

A  test  procedure  was  developed  for  measuring  the  half-adder  switching 
margins  expressed  as  a  percentage  of  a  binary  "one." 


In  order  to  carry  out  this  measurement,  a  700  Hz  triangular  waveform 
is  applied  to  the  inject  source  diode,  thus  increasing  the  amplitude  of  both 
input  charge  packets  at  a  linear  rate. 

A  digital  input  pattern  at  a  400  kHz  rate  is  applied  to  both  input 
control  gates  to  the  half-adder.  The  input  sequence  to  one  control  gate 
(INI)  is  11101010  and  the  input  to  the  other  control  gate  (IN2)  is  10111010. 

The  output  charge  packets  from  the  sum  and  carry  channels  from  the  half- 
adder  are  converted  to  voltages  and  displayed  on  an  oscilloscope. 

The  SUM  output  displays  two  traces,  the  main  trace  is  painted  at  each 
alternate  combination  of  the  input  pattern  when  INI  and  IN2  are  both  binary 
one,  it  therefore  paints  the  trace  at  200  kHz.  The  second  trace  occurs  twice 
per  pattern  cycle  when  either  (but  not  both)  inputs  are  binary  one,  this  ghost 
trace  is  painted  at  100  kHz. 

To  calibrate  the  test  set-up  correctly,  the  amplitude  of  the  input 
triangular  waveform  must  be  adjusted  so  that  the  slope  of  the  curve  1  (ghost 
trace)  as  shown  on  Figure  3-16  intersects  the  flat  portion  of  curve  3  at  one 
point  (a). 

Curve  1  is  the  SUM  output  of  a  single  binary  one  input  when  its  charge 
is  less  than  required  to  fill  the  half-adder  D  input  storage  well.  The  apex 
of  this  curve  is  where  the  single  one  charge  packet  is  large  enough  to 
exactly  fill  the  D  storage  well. 

If  a  flat  period  exists  at  the  apex  of  curve  1  it  indicates  that  the 
triangular  waveform  is  too  great  and  the  single  binary  one  is  overflowing 
across  the  barrier,  it  therefore  should  be  reduced  until  the  apex  is  a 
single  point. 

Curve  2  is  the  SUM  output  resulting  from  two  ramped  inputs.  Segment  d-c 
of  curve  2  represents  the  sum  output.  The  total  charge  of  the  two  ramped 
inputs  is  greater  than  that  required  to  fill  the  half-adder  0  input  storage 
area.  During  this  portion  of  the  curve,  the  amount  of  charge  transfered 
across  the  barrier  is  insufficient  to  fully  change  the  potential  of  the  charge 
sense  node  from  a  transfer  state  to  a  barrier  state. 
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Figure  3-16.  Oscilloscope  Display  of  Sum  and  Carry  Outputs  of  a 
Half-Adder  During  Margin  Testing. 

Curve  3  reflects  the  carry  output  fron  two  increasing  inputs.  As  the 
charge  transfered  onto  the  charge  sense  electrode  increases,  the  carry  output 
increases  and  the  sum  output  decreases  until  the  inputs  reach  full  (Qq)  value. 

The  timebase  of  the  CRT  is  adjusted  so  that  curve  1  completely  fills  ten 
spaces  of  the  graticule,  that  is,  a  binary  one  fills  100%  of  the  graticule. 
Each  space  on  the  graticule  is  therefore  20%  of  the  sum  output  for  two 
binary  one  inputs  displayed  on  curve  2. 

The  thin  one  margin  is  the  flat  portion  of  curve  3  from  a  to  b  and  the 
fat  zero  margin  is  the  flat  portion  of  the  curve  2  from  c  to  d.  In  most 
cases  the  two  margins  should  be  of  equal  amplitude,  they  can  be  made  equal 
by  adjusting  the  preset  transfer  level  of  the  charge  transfer  electrode. 
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A  photograph  of  the  DP5  full -adder  output  curves  Is  shown  In  Figure 
3-17.  Oue  to  the  non-linearity  of  the  sum  output  for  a  single  ramped  Input 
(curve  a),  the  test  set  up  was  calibrated  so  that  a  linear  projection  of 
curve  a  Intersects  the  center  of  the  graticule.  The  photograph  shows  a 
fat  zero  margin  of  25%  and  a  thin  one  margin  of  70%. 

3.4.2  Other  Uses  of  Half-Adders 

A  half-adder  can  be  used  for  several  other  applications  In  addition  to 
the  exclusIve-OR  and  the  AND  functions  described  previously. 

3.4.2. 1  Refresh  and  Invert 

By  connecting  one  of  the  Inputs  of  a  half-adder  to  a  constant  binary 
one,  the  truth  table  is  reduced  to  that  shown  In  Table  3-4.  The  functions 
provided  by  Table  3-4  provide  both  a  refresh  and  a  binary  logic  interter. 

Table  3-4.  Half  Adder  Truth  Table  with  one  Input 
Always  a  Binary  One. 


Inputs 

Outputs 

AT 

B1 

Sum/Al 

Carry/Al  Restored 

0 

1 

1 

0 

1 

1 

0 

1 

3.4.2. 2  Fan-Out 

The  fan-out  generator  Is  similar  to  the  charge  refresh  cell  described 
above,  the  only  difference  is  that  instead  of  transfering  in  a  charge 
packet  equal  to  a  binary  one  at  each  clock  phase  (constant  binary  one  input) 
we  transfer  in  a  charge  packet  that  is  two  to  three  times  the  size  of  a 
binary  one.  Care  must  be  taken  in  the  design  of  the  T-gate  and  multiple 
output  ports  of  the  fan-out  generator  to  ensure  that  the  transfer  lengths 
are  exactly  the  same  or  differences  in  charge  packet  size  will  be 
encountered. 
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Figure  3-17.  Sum  and  Carry  Outputs  of  a  DP5  Full-Adder 
Showing  a  2S%  Fat  Zero  Margin  and  a  70% 
Thin  Binary  One  Margin. 


3.4.2. 3  Frequency  Divider 

A  clock  frequency  can  be  divided  down  by  any  even  integer  with  a  half¬ 
adder  by  connecting  the  SUM  output,  through  a  shift-register  delay,  back 
to  one  of  the  inputs.  The  input  clock  is  applied  to  either  the  source  inject 
diode  or  the  Cl  control  gate.  If  no  delay  is  inserted  in  the  feedback  path 
between  the  SUM  output  and  the  input,  there  is  one  clock  period  delay  through 
the  half-adder  and  the  output  from  the  carry  port  will  be  a  pulse  train  at 
half  the  clock  frequency. 

If  a  one-bit  delay  is  inserted  in  the  feedback  path  then  the  clock 
frequency  will  be  divided  by  4.  The  output  frequency  is  f  /2(t+1)  where  f 
is  the  clock  frequency,  and  t  is  the  number  of  stage  delays  inserted  in  the 
feedback  path. 

3.4.3  Advantages  of  Half-Adders  over  Full -Adders 
3. 4. 3.1  Clock  Frequency 

For  the  large  charge  packets  used  in  DCCL's,  the  transfer  of  charges  is 
dominated  by  self-induced  drift.  In  a  half-adder,  the  01  clock  is  applied 


I 


to  the  D  gate.  The  time  duration  of  the  01  clock  is  determined  by  the  time 
necessary  for  an  input  of  2Qq  charges  to  fill  the  input  storage  area  D, 
transfer  over  the  barrier  and  fill  the  floating  diffusion  well  of  the 
charge  sense  electrode. 

The  time  required  for  the  initial  2Qq  charge  to  settle  to  within  one 
thermal  voltage  (KT/q)  has  been  shown  to  be 


L3  Wr 
HA  Lox 
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where:  L  is  the  total  length  of  the  electrodes  over  the  input  gate,  the  D 
storage  gate,  the  barrier,  and  the  floating  diffusion;  W  is  the  channel  width; 
CQx  is  the  oxide  capacitance  per  unit  area;  y,  is  the  mobility  of  the  carriers; 
and  (01  -  02)  =  2Qq/ ( L2WCqx )  is  the  input  charge.  The  potential  difference 
01  -  02  is  the  difference  in  surface  potentials  between  an  empty  well  and  a 
well  with  a  full  charge  packet  Qq. 

At  the  end  of  the  self-induced  drift  period,  the  remaining  input 
charge  has  a  surface  potential  of  %  26  mV  (at  room  temperature)  and  is  swept 
out  by  the  fringing  fields. 


i 


I 

I 
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The  full-adder  has  an  additional  transfer  area  and  storage  gate  that 
has  to  fill  when  the  initial  input  charge  is  3Qq.  The  self-induced  drift 
period  for  the  full  adder  is 


Ti  L3  Wf 
FA  ox 
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The  ratio  in  self-induced  drift  time  between  a  half-adder  and  a  full- 
adder  is 


fcFA  ?  ,LFA, 
lHA  HA 
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For  the  specific  designs  described  here,  L,..  = 

HA 

Consequently,  the  01  period  for  the  full -adder  will 
for  the  half-adder. 


1.4  mil  and  Lp^  =  2.6  mil . 
be  2.1  times  that  required 
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The  clock  period  for  3  phase  full -adders  and  half-adders  can  be  divided 
into  two  periods,  the  period  that  the  charges  are  equilizing  while  the  01 
clock  is  negative  and  the  period  when  the  01  clock  is  positive  and  the  other 
clocks  are  controlling  the  charges.  In  a  half-adder,  the  first  period  is 
approximately  40%  or  0.4t  and  the  second  period  is  60%  or  0.6t.  In  a  full- 
adder  the  first  period  is  2.1  x  0.4t  =  0.84t  so  that  the  total  time  for  a 
full  adder  is  1.44t,  compared  to  t  for  a  half-adder. 

3.4. 3.2  Signal -to-Noise  Ratio 

A  half-adder  requires  one  input  port  to  the  floating  diffusion  on  the 
master  side  of  the  charge  sense  electrode. 

The  additional  channel  to  the  intermediate  storage  area  in  a  full-adder 
requires  that  a  output  port  be  added  to  the  floating-diffusion  node  of  the 
charge  transfer  electrode. 

The  additional  polysilicon-diffusion  capacitance  due  to  this  output  port 
is  added  to  the  charge  sense  electrode  parasitic  capacitance,  thereby  reducing 
the  sensitivity  and  gain  of  the  full  adder  as  compared  to  the  half  adder. 

3.5  I/O  CIRCUITRY 


3.5.1  Input  Circuits 

3. 5. 1.1  DCCD  Input  Charge  Generation 

The  basic  scuppering  (also  known  as  "fill  and  spill"  or  "potential 
equilibration")  input  scheme  is  depicted  schematically  in  Figure  3-18.  The 
timing  shown  is  for  a  3-phase  CCD  structure.  There  are  two  variations  of 
this  scheme;  one  with  the  data  input  on  the  Cl  electrode,  and  the  other  with 
the  data  input  on  the  inject  diffusion.  Both  versions  use  the  same  voltaqe 
timing. 

The  generation  of  a  DCCD  logic  one  charge  packet  can  be  explained  from 
Figure  3-18.  At  time  sector  0,  the  Cl  surface  potential  is  brought  to  4  volts 
and  the  01  clock  assumes  a  blocking  level  of  1  volt.  At  sector  1  the 
inject  diffusion  lies  at  2  volts,  and  C2  surface  potential  goes  to  8  volts. 

The  inject  diffusion  is  now  forward  biased  and  electrons  (holes  in  the  case  of 


Figure  3-18.  Schematic  Diagram  showing  the  basic  charge  input 
generation  technique. 
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p-channel  devices)  are  injected  across  the  surface  under  Cl  and  C2  until 
those  surface  potentials  reach  2  volts.  At  sector  2,  the  inject  diffusion  is 
taken  back  to  7  volts.  Electrons  in  the  Cl,  C2  well,  now  spill  back  into  the 
inject  diffusion  across  the  4  volt  barrier  of  Cl.  This  scuppering  process  is 
allowed  to  continue  up  to  sector  4.  By  this  point  in  time  the  surface  potential 

under  Cl  and  C2  will  be  approximately  4  volts.  This  leaves  a  packet  of 

electrons  trapped  in  the  C2  potential  well.  The  size  of  this  packet  can  be 
readily  determined  by  integrating  the  C2  well  capacity  with  respect  to  surface 

potential  from  4  to  8  volts.  At  sector  4  the  Cl  surface  potential  is  switched 

back  to  1  volt  and  the  01  surface  is  brought  to  8  volts.  Now  the  charge 
packed  under  C2  redistributes  itself  equally  between  C2  and  01.  At  sector  5 
the  C2  surface  potential  returns  to  1  volt.  During  this  transition  the 
complement  of  the  charge  packet  is  moved  under  01.  Thus  a  charge  packet, 
representing  a  DCCD  logic  one,  is  generated  and  shifted  out  of  the  input 
structure.  In  the  case  of  a  logic  zero,  no  charge  is  captured  in  C2  as  Cl 
remains  at  1  volt  during  sectors  1  through  4. 

3. 5. 1.2  Input  Buffers 

A  TTL  receiver  has  been  developed  for  the  DCCL  logic  family.  It  is  designed 
to  operate  the  Cl  electrode  of  the  standard  input  circuit  shown  in  Figure 
3-18.  Figure  3-19  shows  a  schematic  of  the  TTL  receiver  with  its  output  tied 
to  the  standard  input  structure  (labeled  CI1000).  This  receiver  is  designed 
to  operate  with  low  power  schottky  TTL.  Q1  of  this  circuit  provides 
dynamic  pull  up  for  the  TTL.  A  single  receiver  is  capable  of  driving  3  DCCL 
inputs  at  a  5  MHz  clockrate. 

In  the  second  scheme  mentioned  above,  where  the  inject  diffusion  is  used 
as  the  data  input  port,  the  formation  of  logic  ones  is  identical  to  the  one 
described  above  with  the  exception  that,  to  form  a  logic  zero,  the  inject 
diffusion  must  be  kept  at  7  volts  throughout  the  clock  period.  In  contrast 
to  the  first  input  scheme.  Cl  is  cycled  between  1  and  4  volts  every  clock 
period  and  the  polarity  of  the  input  is  reversed  (i.e.  for  the  Cl  input,  the 
most  positive  level  yields  a  logic  one;  for  the  inject  input,  the  most 
negative  level  yields  a  logic  one). 
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3.5.2  Output  Circuits 

There  are  three  main  output  buffers  used  for  DCCL ;  the  simple  source 
follower,  the  two  stage  source  follower  with  self  biasing,  and  the  TTL  trans¬ 
mitter  which  converts  DCCL  into  TTL  inputs. 

3 . 5 . 2 . 1  Source  Follower  Output  Circuit 

A  schematic  of  the  simple  source  follower  is  shown  in  Figure  3-20.  This 
is  the  most  general  purpose  output  buffer  for  research  applications  as  it  is 
composed  of  only  three  transistors  and  all  circuit  nodes  are  independently 
accessible  for  biasing  flexibility.  A  functional  description  of  this  interface 
circuit  is  given  below. 

At  sector  5  (see  Figure  3-20  timing  diagram)  the  collapsing  03  surface 
potential  pushes  its  charge  packet  across  of  semiconductor  surface  and  over 
the  Vg  potential  barrier  to  the  n+  output  diffusion.  This  changes  the  potential 
of  the  diffusion  from  its  reset  level  of  VQ,  to  a  less  positive  level.  A 
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logic  one  charge  packet  brings  the  diffusion  potential  to  V1  .  This  potential 
will  persist  until  sector  1  where  03  takes  a  positive  transition  in  voltage. 
This  transition  capacitively  couples  onto  the  output  diffusion  to  some 
extent  depending  on  layout.  At  sector  2  the  0R  electrode  is  strobed  to  a 
positive  potential  such  that  VQ  =  VRR  -  VTH^2.  By  sector  4  the  reset 
condition  is  stable  and  0R  is  switched  to  its  negative  level,  thus  turning 
Q1  off.  This  negative  transition  is  capacitively  coupled  to  the  output 
diffusion  and  is  typically  quite  pronounced;  it  is  commonly  referred  to  as  the 
reset  pedestal.  At  this  point  the  output  diffusion  is  ready  to  accept  another 
signal  charge  packet  from  the  CCD  channel. 

^ .  5 . 2 . 2  Two  Stage  Source  Follower  Output  Circuit 

In  the  course  of  technology  development  the  interface  circuits  must  meet 
the  demands  of  higher  frequency  operation.  To  this  end,  the  two  stage  source 
follower  shown  in  Figure  3-21  has  been  used  throughout  the  development  of  n- 
channel  CCD.  This  buffer  operates  in  the  same  manner  as  the  simple  source 
follower  described  above  with  regard  to  the  acceptance  and  resetting  of  signal 
charge.  The  second  follower  stage,  and  Q5  provide  additional  buffering  to 
the  external  probe  capacitance  placed  on  the  VQ  node.  Qg  and  Q7  form  a 
voltage  divider  which  is  used  to  bias  the  load  devices  Qg  and  Qg.  This  buffer 
will  drive  a  7  pf  load  with  a  small  signal  gain  of  0.9  up  to  a  frequency  of 
5  MHz. 

3 . 5 . 2 . 3  DCCD  to  TTL  Tri-State  Output  Circuit 

For  the  purpose  of  interfacing  between  DCCL  and  TTL,  a  tri-state  output 
buffer  has  been  developed.  Figure  3-22  shows  a  schematic  of  this  type  of 
buffer  which  is  currently  slated  for  use  on  the  fast  Hadamard  transform 
device  FHT-1.  This  citcuit  is  composed  of  a  source  follower  input  stage 
(Q1  through  Q4).  The  output  of  this  stage  is  capacitively  coupled  through 
Cl  to  the  input  of  the  inverter  stage  Qg  and  Q^.  The  inverter  stage 
is  dynamically  biased  by  0A  through  Qg.  The  inverter  output  is  sampled 
into  the  next  stage  by  05  through  Qg.  The  next  stage,  Qg  through  Q^,  forms 
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DEVICE  1 

TEST  1QA 

GATE 

WIDTH 

LENGTH 

Q1 

0.6 

0.5 

Q2 

0.6 

0.5 

Q3 

0.7 

0.35 

Q4 

0.7 

2.5 

Q5 

0.6 

0.35 

Q6 

0.4 

1.4 

07 

0.7 

0.35 

Q8 

0.7 

0.35 

09 

1.1 

1.4 

Q10 

2.0 

0.35 

Oil 

2.0 

0.35 

Q12 

1.0 

1.4 

Q13 

3.2 

0.35 

Q14 

3.2 

0.35 

Q15 

5.9 

0.35 

Q16 

5.9 

0.35 

Q17 

0.4 

1.4 

Q18 

0.7 

0.35 

019 

0.4 

0.4 

020 

0.4 

0.4 

Figure  3-22.  Schematic  of  a  tri-state  DCCD  to  TTL  output 
circuit  transmitter. 
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a  cross  coupled  inverter  pair  which  switch  the  output  transistors  Q,,  and  Q,- 

I  J  I  o 

in  a  complementary  fashion.  In  the  tri-state  mode  EN1  is  low  and  the  inverter, 
composed  of  and  Qj^, turns  both  output  transistors  off  through  and 
Q^.  The  load  devices  Qg  and  are  also  turned  off  through  Q.,q.  This  was 
done  in  order  to  minimize  the  static  current  drawn  in  the  tri-state  mode  and 
thus  minimize  the  power  dissipation. 

The  tri-state  impedance  is  greater  than  1  Ms:.  The  maximum  operating 
frequency  is  5. 0  MHz  driving  a  single  low  power  schottky  TTL  input. 

3.6  MEMORY  CELLS 

One  of  the  most  significant  advantages  of  DCCD  technology  is  the  ability 
to  integrate  high  density,  large  scale  memories  with  high  density  digital 
logic  circuits.  A  study  of  CCD  memory  applications  indicated  that  a  gap 
existed  between  existing  high  speed  -  high  power  bipolar  and  MOS  memories 
and  slower,  high  density  magnetic  memories;  it  was  thought  that  this  gap 
could  he  filled  by  CCD  memories  that  combined  the  advantages  of  very  high 
density,  moderate  speed  and  extremely  low  power  requirements.  Although  CCD 
memories  are  significantly  slower  than  the  fastest  bipolar  memories,  they 
are  significantly  faster  than  disc  and  bubble  memory  storage  systems.  The 
extremely  high  density  of  CCD  memories  permit  their  use  for  bulk  storage 
applications.  They  can  also  be  used  effectively  for  both  comparator  and 
correlation  functions,  in  addition  to  accumulation  functions. 

During  the  course  of  the  memory  development  program,  it  became  evident 
that  the  complex  offset  gate  geometry  (to  be  discussed  later  in  this  section) 
woult  have  a  significant  impact  on  memory  cell  yields.  The  advantages  of 
much  higher  bit  densi t>  were  of fset  by  potential  yield  losses,  due  to  a  factor 
of  two  (2)  increase  in  processing  complexity.  Combining  a  large  number  of 
offset  gate  memory  cells  for  bulk  storage  applications  appeared  infeasible 
due  to  potential  yield  losses. 

During  1975  and  1976,  processing  capabilities  Improved  sufficiently  to 
insure  fabrication  of  conventional  SPS  memory  blocks  that  employed  5  micron 
spacing.  Memory  block  density  therefore  improved  sufficiently  to  compete 
with  offset  gate  device  density;  this  was  accompanied  by  yields  that  indicated 
the  feasibility  of  interconnecting  2-kilobit  memory  blocks  into  128  kilobit 
or  256-kilobit  (large  scale)  memory  arrays. 
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3.6.1  The  Offset  Gate  Device  Configuration 

The  Initial  design  of  the  offset  gate  device  consisted  of  two  adjoining 
transfer-storage  (t-s)  regions,  as  shown  In  Figure  3-23.  One  t-s  pair  was 
connected  to  the  Phase  I  clock,  while  another  t-s  pair  was  connected  to  the 
Phase  II  clock.  The  t-s  regions  were  formed  by  electrically  connecting 
adjacent  pairs  of  overlapping  gate  structures.  The  properties  of  the  two 
types  of  gates  were  chosen  to  create  a  transfer  and  storage  region  for  the 
clocked  charge  packets,  when  connected  to  the  same  clock  phase.  Each  pair 
of  transfer  and  storage  regions  provided  one  minimum  geometry  unit,  {  ^  , 
equivalent  to  one-half  of  a  bit  length. 

The  offset  gate  device  was  formed  with  either  one  or  two  additional 
mask  layers  than  required  by  a  conventional  two-phase  overlapping-gate 
structure.  Compared  to  conventionally  configured  devices,  the  offset  gate 
geometry  has  a  significant  bit  packing  density  advantage  of  2x  for  both  linear 
and  parallel  arrays,  while  the  advantages  of  the  offset  gate  structure  over 
conventional  serpentine  and  series-parallel  arrays  can  be  as  great  as  a 
factor  of  four.  Since  each  polygate  electrode  pair  was  connected  to  the  same 
clock  phase,  short-circuits  did  not  create  functional  problems  when  occasional 
strands  of  polyslllcon  shorted  adjacent  polygate  structures  together.  Addition¬ 
ally,  the  two-phase  clocking  arrangement  had  a  significant  effect  on  the  total 
number  of  contacts  necessary  for  a  given  memory  block.  The  conventional  struc¬ 
ture  requires  a  minimum  of  two  contacts  per  bit,  along  any  shift  register 
that  cannot  be  accessed  from  both  sides;  the  offset  gate  device  required 
contacts  only  to  the  initial  clock  line  Inputs. 

The  concept  of  the  offset  gate  device  is  shown  in  Figure  3-23.  The  first 
mask  step  Is  used  to  define  the  offset  steps  in  the  gate  insulator,  along  the 
length  of  the  CCD  channel.  The  first  gate  electrode  structure,  comprised 
of  polyslllcon.  Is  then  aligned  to  the  partially  covered  regions  of  thin 
dlelectilc  (region  A)  which  In  this  Instance,  Is  thermal 1y- grown  oxide.  The 
polygate  electrode  also  covered  the  thick  dielectric  (region  B)  as  well.  The 
step  In  the  Insulator  beneath  this  gate  created  the  t-s  pair  within  each 
element  of  the  gate  structure.  A  step  in  the  Insulator  also  existed  In  the 
regions  not  covered  by  the  gate  electrodes,  indicated  in  the  diagrams  as 
regions  C  and  D;  these  regions  had  to  be  modified  to  produce  a  t-s  pair  for 
the  subsequently  deposited  and  defined  metal  gate  electrode. 


(a) 


(b) 


(c) 


(d) 


a)  Alignment  considerations  of  offset  masks  with  respect  to  the  gate  masks 

b)  Use  of  an  offset  mask  to  produce  a  change  in  insulator  thickness 
beneath  a  gate  structure 

c).  d)  l  llustration  of  the  use  of  Si3N4  and  ion  implantation  respectively  to 
produce  the  desired  asymmetry  for  two  phase  operation 


Figure  3-23.  Offset  gate  device. 
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When  an  insulator  step  was  used  to  define  the  t-s  region,  only  one  offset 
mask  was  required.  After  forming  the  first  gate  electrode  structure,  the  gates 
were  used  as  a  mask  to  selectively  remove  and  regrow  the  exposed  dielectric 
in  the  channel  region.  A  second  offset  mask  could  then  be  used  to  define  a 
properly  oriented  t-s  pair  beneath  the  overlapping  gate  structure,  to  insure 
that  the  charge  packets  would  be  permitted  to  flow  in  the  correct  direction. 

Figure  3-23  (a)  shows  in  cross-section  two  of  the  masks  necessary  to 
make  the  offset  gate  structure  and  Figure  3-23^ b )  shows  the  two  gate  masks 
necessary  for  a  standard  (or  conventional)  overlapping  gate  two  phase  CCD. 

In  Figure  3-23  the  basic  geometry  unit  is  designated  e ,  while  r  represents 
the  mask  alignment  tolerance.  Table  3-5  contains  some  comparisons  for  the 
two  structures. 

The  design  of  the  conventional  two-phase  overlapping  gate  CCD  makes  it 
independent  of  any  mask  misalignment  m,  that  is  less  than  the  maximum  tolerance, 
r.  The  column  of  values  listed  in  Table  3-5  reflects  this  fact.  The  offset 
gate  device  is,  however,  dependent  upon  the  accuracy  of  mask-to-mask  alignment. 

One  column  in  Table  3-5  gives  the  nominal  values  when  m  =  0;  the  other  column 
lists  the  appropriate  values  for  m  <  r.  Since  m  can  be  positive  or  negative, 
the  sign  associated  with  the  worst-case  situation  is  shown.  In  the  nominal 
case,  the  offset  gate  device  had  more  than  a  factor  of  two  improvement  in 
power  dissipation,  more  than  a  factor  of  four  improvement  in  maximum  operating 
frequency,  and  more  than  a  square  root  of  two  degradation  in  signal -to-noise 
ratio.  The  variations  are  direct  consequences  of  the  storage  length  reduction 
in  the  offset  gate  device.  There  is  no  change  in  the  parallel  edge  effect, 
since  both  device  configurations  are  assumed  to  have  the  same  channel  width. 

It  is  worth  noting  that  loss  in  the  signal -to-noise  ratio  can  be  regained  by 
allowing  wider  channels  for  the  offset  gate  device.  This  will  produce 
signal-to-noise  ratios  equal  to,  or  better  than  those  for  the  conventional 
device,  while  maintaining  a  bit  area  advantage  for  the  offset  gate  device. 

CCD-1  was  the  designation  of  the  mask  set  that  was  in  use  when  this  program 
was  initiated.  The  mask  set  was  designed  to  allow  production  of  conventional 
two  or  four  phase  CCD  shift  registers  in  both  serial  and  serpentine  organizations . 
Offset  gate  devices  could  also  be  produced  with  both  serial  and  serpentine 
organi zations .  Table  3-6  summarizes  the  shift  register  devices  that  appeared 
on  mask  set  CCD-1.  The  table  includes  the  bit  length  of  the  serial  devices 
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Table  3-5.  Implementation  comparison  for  Equal  Channel  Widths. 


Assuming  fast  surface  state  noise  Is  the  dominant  component 

* 

Assuming  a  clocking  scheme  that  allows  Interleaving  data;  no  such  Interleaving 
is  required  in  the  offset  gate  device. 


Table  3-6.  CCD-1  shift  register  devices. 


Serial  Bit  Length  ( Mi  crons ) 
Or 

Oevice  Length  Serpentine  Bit  Density 

/Rifel  I  m _ 


Device  Type 

(Bits) 

(M1crons2/Bit) 

Device  Organization 

Standard 

8 

86.4 

Serial 

Standard 

8 

43.2 

Serial 

Standard 

64 

43.2 

Serial 

Standard 

128 

1187. 

Serpentine 

Standard 

128 

1187. 

Serpentine 
with  Refresh 

Offset  Gate 

8 

20.3 

Serial 

Offset  Gate 

24 

309.7 

Serpentine 

Offset  Gate 

72 

309.7 

Serpentine 
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and  the  bit  density  of  the  serpentine  devices.  In  all  cases,  the  CCD  channel 
width  was  7.5  microns.  In  addition  to  producing  offset  gate  devices,  CCD-1 
allowed  evaluation  of  the  effects  of  turning  corners,  as  is  required  by 
serpentine  organizations.  An  examination  of  the  results  obtained  by  testing 
these  devices  indicated  that  the  offset  gate  geometry  was  producible  and 
operated  as  expected,  when  compared  to  conventional  shift  register-type  device 
configurations .  Tests  also  indicated  that  the  serpentine  devices  suffered  no 
degradation  in  performance  as  a  result  of  corner-turning.  The  success  of 
CCD-1  was  used  as  a  basis  for  the  generation  of  a  second  mask  set,  designated 
LSM-1 . 

The  LSM-1  mask  set  was  devoted  primarily  to  the  production  of  offset  gate 
devices.  The  mask  set  was  designed  to  allow  further  evaluation  of  these 
devices,  particularly  the  serpentine  organization.  In  addition,  refresh 
circuits  were  designed  and  included  on  this  mask  set  for  the  first  time. 

These  refresh  circuits  were  specifically  designated  to  fit  within  the  space 
of  two  serpentine  shift  registers;  this  restriction  necessitated  the  use  of 
several  untested  circuit  concepts.  While  the  LSM-1  mask  set  did  provide 
additional  experience  producing  offset  gate  devices,  the  main  impact  on  the 
program  was  the  experimental  results,  indicating  that  the  use  of  silicon  nitride 
in  the  gate  structure  was  undesirable.  This  conclusion  was  based  upon  processing 
difficulties  caused  by  the  Si^N^,  rather  than  by  high  fast  surface  state 
density  (N$s)  levels.  It  was  discovered  that  the  refresh  circuit  designs 
included  on  the  LSM-1  chip  did  not  function  in  the  required  manner.  While  a 
fix  of  this  problem  was  devised,  the  continuing  evaluation  of  overall  system 
trade-offs  indicated  that  an  SPS  structure  was  far  more  useful  than  a 
serpentine  structure.  The  net  result  was  that  the  refresh  circuit  design  for 
the  serpentine  structure  was  set  aside.  Consequently,  a  new  mask  set  design 
that  concentrated  on  the  SPS  memory  was  begun  and  was  designated  CCD-2. 

The  serial -to-parallel -to-serial  (SPS)  shift  register  design  concept  has 
been  known  for  some  time.  Figure 3-24  illustrates  the  organization  of  such  a 
shift  register  as  well  as  the  lines  of  data  flow.  The  data  comes  in  serially, 
is  then  clocked  along  the  parallel  registers  and  finally  shifted  out  of  the 
memory  cell  serially. 


Operation  of  an  SPS  register  was  different  for  the  offset  gate  device 
than  for  the  standard  or  conventional  two  phase  device.  The  difference  arises 
from  the  fact  that  maximum  bit  density  is  dependent  upon  th«  spacing  between 
the  parallel  shift  registers.  Thus,  the  optimum  spacing  of  parallel  channels 
corresponds  exactly  to  the  spacing  of  the  clock  phases  in  the  offset  gate 
serial  register,  whereas,  it  corresponds  to  a  distance  equal  to  one-half  the 
spacing  of  the  phases  in  the  standard  serial  register,  as  shown  in  Figures  3-25, 
3-26,  3-27,  and  3-28.  Consequently,  to  achieve  maximum  density  with  an  SPS 
memory  built  from  a  conventional  structure,  it  is  necessary  to  interleave 
the  parallel  registers  in  the  following  manner: 

The  input  serial  register  is  loaded  until  all  data  resides  under  Phase  III 
serial  gates;  then  a  parallel  transfer  of  data  occurs.  This  event  is  followed 
by  a  reloading  of  the  input  serial  register,  such  that  all  data  now  resides 
under  Phase  I  serial  gate  structures.  Parallel  transfer  of  the  data  again  occurs, 
thereby  insuring  that  all  data  is  stored  in  the  first  location  of  the  inter¬ 
leaved  parallel  shift  registers.  The  parallel  register  is  clocked  one  bit 
and  the  serial  load  procedure  is  repeated.  A  similar  procedure  is  used 
again  as  a  means  of  unloading  the  data  through  the  output  serial  register. 

In  order  to  achieve  maximum  bit  density  with  a  conventional  or  standard  SPS 
structure,  it  is  necessary  to  provide  a  special  clock  waveform  for  the  strobe 
lines  that  allow  data  to  enter  and  leave  the  parallel  registers.  The  strobe 
occurrence  must  be  synched  to  the  Phase  I  clock  for  one-half  of  the  cycle; 
then,  synched  to  the  Phase  III  clock  for  the  following  half  cycle.  The  high 
density  structure  permits  a  simpler  clocking  scheme  for  the  strobe  lines,  as 
a  result  of  the  fact  that  only  one  serial  load  per  paral  lei  clock  is  required 
instead  of  two  as  described  above  for  the  conventional  structure. 

To  appreciate  the  implications  of  the  two  types  of  CCD  organizations, 
consider  Figure  3-29.  The  minimum  geometry  dimension  permitted  by  the  mask 
design  rules  was  designated  by  £;  the  minimum  mask  overlap  required  was  labeled 
r.  Part  a)  of  Figure  3-29  shows  the  well-known  result  that  a  total  length  of 
4(r  +  i)  is  required  for  each  bit  of  a  standard  structure  CCD.  Similarly,  part 
c)  shows  that  a  length  of  4 i  is  required  to  build  a  high  density  CCD,  starting 
with  the  structural  design.  Part  b)  of  Figure  3-29  shows  what  can  be  done  if 
a  standard  structure  design  is  implemented  using  the  high  density  technique. 

Note  that  there  is  a  factor  of  two  improvement  in  required  bit  length  in 
comparison  to  the  structure  of  part  a)  and  the  best  improvement  is  obtained 
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3-27.  Serial  to  parallel  interlaced  SPS  structure. 


DIRECTION  OF  CHARGE  TRANSFER 


i  i  I  I  I  Till  1 

S04  $03  Sv>2  SOI  S04  S04  SOI  £02  S03  S04 


SERIAL  GATES  SERIAL  GATES 


Figure  3-28.  Clocking  system  required  to  change  from 
LIFO  read-out  to  read-in. 
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Figure  3-29. 


Comparison  of  device  structures 

(a)  Standard  device  structure 

(b)  High  density  device  built  with 
original  standard  structure  design 

(c)  High  density  device  built  with 
original  high  density  structure  design 


by  the  structure  of  part  c).  Figure  3-30  quanti fies  this  statement,  by  showing 
the  required  bit  length  as  a  function  of  (r/t). 


Examination  of  the  parallel  section  of  an  SPS  memory  can  provide  informa¬ 
tion  as  to  the  required  area  per  bit.  This  is  provided  in  Table  3-7,  where 
data  interleaving  is  assumed  to  exist  for  the  standard  structure. 

Figure  3-31  provides  the  required  area  per  bit  as  a  function  of  the  ( r/i ) 
ratio.  Thus.it  is  seen  that  for  any  SPS  memory,  the  high  density  (offset  gate) 
approach  will  always  have  an  area  advantace  of  at  least  a  factor  of  two.  In 
essence,  the  high  density  approach  will  produce  at  least  twice  as  many  bits 
for  a  given  chip  size  as  the  standard  approach. 

Assuming  the  implementation  of  an  SPS  organi zation ,  there  are  several 
methods  of  optimizing  the  structure.  The  two  most  important  approaches  are: 

(a)  minimize  the  total  number  of  transfers,  N^,  necessary  to  move  a  data  bit 
from  input  to  output,  and  (b)  minimize  the  total  power,  P,  required  to  operate 
the  SPS  device. 

In  performing  these  minimizations,  it  is  reasonable  to  assume  that  the 
total  storage  capacity,  S,  and  the  input  frequency,  f  ,  are  dictated  by  other 
system  requirements  and  therefore  are  fixed.  Table  3-8  lists  the  basic 
structure.  Cgj^  is  the  capacitance  per  bit  (including  both  Phase  I  and  Phase  II 
components)  and  is  assumed  proportional  to  the  total  storage  area  per  bit. 

Table  3-9  shows  the  parameter  values  for  the  two  optimization  approaches  while 
Table  3-10  compares  these  two  approaches.  Note  that  as  S  becomes  large,  the 
ratios  in  Tabl e  3-1 0  approach  limiting  values.  The  offset  gate  structure 
ratios  both  approaches  and  indicates  a  value  of  1.06;  this  indicates  that  for 
a  large  storage  capacity,  S,  a  design  that  is  optimized  for  minimum  power  will 
in  fact  require  only  6t  more  transfers  than  a  design  that  is  optimized  for 
minimum  transfers.  The  same  is  true  for  power  required  by  a  design  optimized 
with  respect  to  transfers.  In  actuality,  the  two  optimum  conditions  are 
quite  similar.  The  standard  structure,  on  the  other  hand,  shows  a  limiting 
value  of  25*  between  the  two  design  approaches.  It  is  evident  that  the  offset 
gate  structure  can  achieve  near  optimum  operation  with  respect  to  both  N^.  and  P 
with  one  design,  while  the  standard  structure  cannot.  (Unfortunately,  when 
ease  of  fabrication  and  yields  are  taken  into  consideration,  the  ability  to 
build  the  standard  memory  cell  outweighs  the  offset  gate  configuration  in 
every  respect . ) 
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Table  3-7.  Required  area  per  bit 
Figure  3-29. 

for  the  devices 

Device  jyoe 

Area  per  Bit 

Standard  Device  From 

Standard  Structure 

8(1  +  r)2 

High  Density  Device  From 
Standard  Structure 

4(t  +  r)2 

High  Density  Device  From 

High  Density  Structure 

4l2 

Table  3-11  compares  the  offset  gate  and  standard  structures  for  both  of 
the  optimizations.  The  table  indicates  that  the  number  of  transfers  in  the 
standard  structure  is  less  than  those  provided  by  the  offset  gate  structure, 
when  both  are  designed  for  minimum  Ny.  On  the  other  hand,  the  offset  gate 
structure  will  require  less  power  when  both  are  designed  for  minimum  P;  the 
actual  value  of  the  power  ratio  depends  upon  the  value  of  (r/O,  as  expected. 
Assuming  a  photolithographic  capability  of  7.5  microns  and  an  optimistic 
misalignment  tolerance  of  one  micron.  Table  3-11  predicts  that  the  standard 
structure  requires  27%  more  power.  If  a  5  micron  photol ighographic  capability 
is  assumed,  a  power  increase  of  40%  can  be  expected.  This  fact  holds  true  as 
the  misalignment  tolerance  does  not  scale  down  with  an  improvement  in  photo¬ 
lithographic  capabi 1 i ti es .  The  implications  are  clear;  as  technology  improve¬ 
ments  permit  greater  device  density,  the  advantages  of  implementing  the  offset 
gate  geometry  would  seem  to  increase.  Factors  that  limit  the  use  of  the  offset 
gate  device  technology  below  5  microns  will  be  covered  in  a  subsequent  discussion. 

One  additional  advantage  of  the  offset  gate  structure  should  be  noted  here. 
The  ability  to  use  a  single  level  of  metal  or  polygate  structures,  covering 
both  storage  and  transfer  regions,  permitted  the  offset  gate  configuration  to 
use  significantly  fewer  contacts  per  SPS  memory  cell  than  standard  two-phase 
designs.  It  was  shown  that  the  standard  structure  required  4N$  clock  line 
.contacts  per  SPS,  while  the  offset  gate  configuration  required  only  the  initial 
pair.  This  fact  alone  should  have  provided  a  decided  improvement  in  offset 
gate  device  yields  and  corresponding  reliability,  if  not  counterbalanced  by 
substantial  processing  complications  that  severely  limited  device  yields  and 
reliabil ity. 

The  CCD-2  mask  set  was  designed  using  a  minimum  geometry  gate  length  of 
7.6  microns  and  a  mask  misalignment  tolerance  of  2.54  microns.  The  basic  element 
used  on  CCD-2  was  an  SPS  memory  that  stored  2048  bits  as  a  standard  device,  or 
4098  bits  as  a  high  density  device.  (One  of  the  original  purposes  of  this 
mask  set  was  to  produce  standard  or  conventional  shift  register  memory  as  well 
as  SPS  memory  and  use  both  types  of  memory  building  blocks  for  structural 
and  electrical  comparisons.  The  mask  set  did  not  maximize  each  of  the  two 
structures  for  bit  density,  as  the  ability  to  do  a  one-on-one  comparison  of  the 
conventional  versus  the  SPS  structure  would  have  been  lost.  As  a  matter  of 
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convenience,  it  was  decided  to  put  four  SPS  memories  on  each  chip;  thus 
each  chip  contained  a  total  of  8192  x  2,  or  16,384  bits.  This  particular 
chip  design  was  also  viewed  as  an  excellent  vehicle  to  determine  SPS  yields, 
that  were  affected  by  area  related  factors  (i.e.,  oxide  pin  holes;  random 
crystallographic  defects;  polysilicon  and  metal  pattern-  related  shorts  and 
opens;  and  step  coverage). 

Table  3-12  lists  all  of  the  mask  levels  designed  into  the  CCD-2  mask  set. 
This  mask  set  permitted  several  variations  of  the  basic  design;  the  mask  set 
had  the  capability  of  producing  a  double  poly  CCD  structure,  in  which  all 
gates  were  formed  from  either  the  first  polysilicon  layer  (Poly  I)  or  the 
second  polysilicon  layer  (Poly  II);  no  aluminum  gate  structures  were  employed 
in  the  double  polygate  configuration. 

Table  3-13  summarizes  some  of  the  pertinent  factors  of  the  standard  CCD 
as  well  as  the  high  density  designs  included  on  CCD-2.  It  is  i%orth  mentioning 
that  the  standard  SPS  was  designed  to  give  the  highest  bit  density  possible 
for  the  photolithographic  capability  available  at  that  time  (this  included 
minimum  dimensions  for  line  widths  of  7.6  microns  and  h,aximum  misalignment 
tolerance  of  2.54  microns).  The  high  density  SPS  was  designed  to  be  mask 
compatible  with  the  standard  SPS  structure;  thus,  the  high  density  SPS 
structure's  bit  density  is  not  the  absolute  highest  density  achievable  at 
that  time. 

Figure  3-32  is  a  photomicrograph  (SEM)  of  an  SPS  memory  block  taken  from 
a  CCD-2  chip.  There  are  C4  bits  in  the  parallel  and  serial  registers  of  the 
offset  gate  version;  there  are  32  bits  in  each  register  for  the  standard 
version.  Test  results  from  the  CCD-2  mask  set  were  encouraging.  It  was 
felt  at  the  time  that  4  kilobits  were  not  sufficient  as  a  demonstration  device 
and  a  larger  SPS  structure  was  needed.  Consequently,  the  16  kilobit  structures 
used  on  the  LSM-2  mask  set  were  designed  and  fabricated. 

The  mask  set  designated  LSM-2  was  used  as  a  test  vehicle  for  the  16 
kilobit  SPS  structures.  The  overall  layout  is  shown  in  Figure  3-33,  which  is 
a  photomicrograph  of  a  typical  LSM-2  chip.  The  circuits  of  primary  concern  are 
the  two  16  kilobit  SPS  blocks  (one  designed  for  7.5  micron  photolithography 
and  shown  in  Figure  3-34  and  the  other  designed  for  5.0  microns,  as  shown  in 
Figure  3-35. 


Table  3-13.  Comparative  SPS  layout  values 
of  CCD-2. 


Hlqh  Density  SPS 

Standard  SPS 

Blts/SPS 

4096 

2048 

Ml crons 2/B1t/SPS 

413. 

826. 

SPS/ Wafer 

300 

300 

Bits/Wafer 

1.2xl06 

6.1xl05 

M1cron2/B1t/Wafer 

1-7x1 03 

3.3xl03 

1 

I 

I 
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Figure  3-35.  16  K-bit  SPS  unit  designed  for  5.0  micron 

photol ithography. 
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The  design  of  the  two  SPS  blocks  are  quite  similar;  both  have  N$  *  Np  * 

128.  This  particular  choice  was  dictated  by  ease  of  testing  rather  than  by 
a  desire  to  achieve  any  particular  design  criteria.  Choosing  a  binary  number 
for  N$  and  Np  permits  the  use  of  simple,  available  test  equipment  and  control 
logic.  All  of  the  parameters  of  interest,  such  as  transfer  loss,  dark  current 
and  surface  state  density,  can  be  measured  on  such  a  device.  A  summary  of  some 
of  the  pertinent  design  parameters  is  given  in  Table  3-13. 

Figure  3-36  contains  a  general  schematic  plan  view  of  the  LSM-2  SPS 
layout,  showing  the  overall  organization.  The  dual  output  circuits  are 
identified;  they  allow  testing  of  the  serial  input  register  alone.  Also  shown 
are  the  two  input  circuits,  one  for  providing  a  "fat  zero"  charge  and  the  other 
for  providing  the  signal  charge.  The  cross-sections  show  the  general  make-up 
of  the  gate  structures  and  indicate  some  typical  oxide  thickness  (approximately 
1000A).  Those  gates  that  could  be  controlled  independently  and  those  that  were 
bussed  together  are  so  indicated. 

It  is  worth  noting  that  a  total  of  19  LSM-2  device  lots  were  fabricated, 
including  five  single  polysilicon  lots,  as  well  as  14  double  polysilicon  lots. 
Table  3-14  is  a  compilation  of  the  results  that  were  obtained  from  the  LSM-2 
and  LSM-3  mask  sets. 

3. 6. 1.1  LSM-3  Device  Test  Results 

Two  lots,  totaling  six  wafers, were  processed  using  the  LSM-3  processing 
sequence.  The  transfer  inefficiency  of  the  memory  input  serial  registers  was 
.003  -  .004  for  the  7.5  micron  geometry  devices  and  .005  for  the  5  micron 
devices.  The  cause  of  this  relatively  poor  transfer  inefficiency  was  subsequently 
attributed  to  the  intergate  barrier  or  "bump"  problem,  discussed  elsewhere 
in  this  report.  The  dynamic  range  of  the  offset  memory  devices  was  found  to  be 
in  the  range  of  30  to  190  millivolts  for  the  7.5  micron  devices  and  40  to  90 
millivolts  for  the  5  micron  devices.  The  low  dynamic  range  of  the  devices  made 
with  the  LSM-3  mask  set  (as  compared  to  the  400  to  600  millivolts  of  the  LSM-2 
devices)  was  unexpected  and  probably  due  to  dark  current  leakage  originating 
from  etch  pits  at  the  Si02/Silicon  interface. 


3-36.  LSM-2  plan  view  and  cross  sectional  diagrams. 


Table  3-13.  Comparative  SPS  layout  values  of 
CCD- 2. 


Hiqh  Density  SPS 

Standard  SPS 

BltS/SPS 

4096 

2048 

M1crons2/B1t/SPS 

413. 

826. 

SPS/ Wafer 

300 

300 

Bits/Wafer 

1.2xl06 

6.1xl05 

Mi  cron2/B1 t/Waf er 

1 . 7x1 03 

3.3xl03 
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Table  3-14.  Summary  of  LSM-2  and  LSM-3  lots  processed. 


Lot  No. 


-3  £ 

o 


-n  8  \ 
-12  ^  f 


-17  4* 
o 

-18  °- 
o» 

-19  S 
-20  1 


Description 


Regular  lot 

TEOS  for  step  coverage 

Used  mask  6  &  3  to  measure  test  . 

FETS  ) 

Used  AZ  2400  P.R.  for  step  \ 

coverage  I 

Used  various  postbake  conditions  ' 

Regular  lot  Accidental 

Boron  and 
barrier 

Regular  lot  Evaluated 

2nd  polysillcon  nitride  left  on  Poly  1/Pol 

vol tage 

Modified  Barrier  drive-in  Evaluated 

conditions 

Variations  in  processing: 

(1)  phosphorous  gettering  of 
storage  gate  oxide 

(2)  thicker  storage  gate  Evaluated 

oxide 

(3)  plasma  etched  vs  chemi¬ 
cally  etched  polysilicon 

Change  in  process  sequence  to 
eliminate  nitride  remaining  Evaluated 

between  polysilicon  1  &  2 

20'  &  40'  barrier  anneals  Evaluated 

40'  &  100'  barrier  anneals  Evaluated 

10,000  field  oxide  Completed 

Only  FETS  1400/2800  oxide  ratio  Completed 

No  tub,  no  guard  Completed 

Field  oxide  grown  at  1200°C  Evaluated 

Oxide  ratio  1400/2800  Completed 

Only  FETS  14/00/2800  Evaluated 

Implant  doses  4.5,5  and 
5.5  x  1012P32/cm2 

Aligned  in  Cob lit  2020  machine  Evaluated 


Results 


Poly/metal 

Shorts 


Accidental  implantation  of  both 
Boron  and  Phosphorous  into 
barrier 

Evaluated 

Poly  1/Poly  2  low  breakdown 
vol tage 

Evaluated 


Evaluated 

Evaluated 

Evaluated 

Completed  non-evaluated 
Completed  non-evaluated 
Completed  non-evaluated 
Evaluated 

Completed  non-evaluated 
Evaluated 


Using  streamlined  process. 
As  channel  stops.  Revised 
levels  4,  5  &  7 


1  and  2  evaluated 
3  and  4  in  process 


Yield  uata  taken  from  both  7.5  micron  and  5  micron  16  kilobit  memory 
devices  was  significantly  higher  than  expected.  The  memories  were  considered 
good,  or  functionally  acceptable  if  the  proper  delay  between  input  and  output 
data  was  achieved.  A  functional  yield  of  8655  was  obtained  for  the  7.5  micron 
devices;  a  yield  of  70*  was  obtained  for  the  5  micron  devices.  Non-functional 
devices  were  generally  found  in  die  locations  near  the  edge  of  the  wafers, 
which  was  expected.  Though  the  efforts  to  produce  high  density  offset  gate 
memories  were  generally  considered  to  be  successful,  the  processing  complexity 
and  difficult  device  topography  indicated  a  memory  technology  that  would 
receive  limited  acceptance,  despite  high  bit  storage  density.  Inspection  of 
Figure  3-37  provides  an  accurate  indication  of  the  complex  device  geometry  involved 
in  producing  offset  gate  memories. 


4.0  LARGE  SCALE  INTEGRATION  (LSI)  DEVELOPMENT 


4.1  INTRODUCTION 

In  Section  1.1  we  discussed,  in  general,  the  need  for  pipelined  arithmetic 
when  using  DCCL  building  blocks.  As  an  introduction  the  discussion  of  such 
functions  as  multiplication,  addition,  and  subtraction  some  specific  examples 
will  be  of  value.  If  we  want  to  add  or  subtract  two  binary  words,  a-| ,  a^,  ••• 
an  and  b^ ,  b2»...b  ,  we  can  proceed  in  the  following  straightforward  manner. 


First  word 

a„. . .a, 
n  3 

a2  al 

Second  word 

b  . .  .b, 
n  3 

b2  bl 

Carry  bit 

c  . .  .c, 
n  3 

C2 

Sum  c  +  1 

n 

n  3 

s2  si 

A  block  diagram  of  a  2-word,  16-bit  adder  is  shown  in  Figure  4-1  and  a  chip 
photograph  of  a  DP3  16-bit  adder  is  shown  in  Figure  4-2.  There  is  a  one 
clock  period  delay  between  the  input  charge  to  a  full -adder  and  the  generation 
of  the  sum  and  carry  output  charges.  It  will  be  seen  that  delay  stages  have 
been  added  to  the  input  signals  of  the  most  significant  input  channels  in 
order  that  the  input  bits  arrive  synchronously  with  the  carry  bits.  If  an 
unskewed  output  is  desired,  it  is  also  necessary  to  include  corresponding 
delays  to  all  of  the  sum  output  lines  except  the  most  significant  output  bits. 
Two  levels  of  charge  refresh  are  included  in  the  layout  of  the  16-bit  adder 
of  Figure  4-22  and  a  large  MOS  output  driver  buffers  the  output  signals  from 
the  bonding  pads. 

A  parallel  DCCD  multiplier  uses  a  much  more  random  interconnect  pattern 
than  the  adder  array.  A  random  pattern  becomes  difficult  to  layout  using  DCCL 
cells  if  channel  crossings  must  be  avoided.  Parallel  multipliers  involve  the 
use  of  AND  functions  as  well  as  a  combinational  network  of  half-and  full -adders 
The  multiplication  of  two  4-bit  binary  numbers,  a-j  -  a^  and  b-j  -  b^,  is 
performed  in  the  following  straightforward  manner; 


Figure  4-2.  A  2-word,  16-bit  adder  array  of  digital 
charge-coupled  devices. 

Delays  are  also  required  in  the  multiplier  in  order  that  the  summands 
proceed  synchronously  through  the  array.  In  a  similar  manner  to  the  adder, 
additional  delays  are  required  in  the  multiplier  output  channels  if  a  deskewed 
product  is  required. 

The  block  diagram  of  a  4  x  4  multiplier  is  shown  in  Figure  4-3,  for 
example,  and  a  chip  photograph  of  a  DP 3  8x8  multiplier  array  is  shown  in 
Figure  4-4. 


Figure  4-3.  A  block  diagram  of  a  4-bit  x  4-bit  parallel 
multipl ier  array. 
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Figure  4-4.  An  8-bit  x  8-bit  multiplier  array  of  digital 
charge-coupled  devices. 


4.2  EVOLUTION  OF  DCCD  ADDER/ SUBTRACTOR  DESIGNS 


This  section  describes  the  evolution  of  the  adder  and  subtractor  array 
designs  implemented  on  the  DP  mask  series  and  the  Azimuth  Correlator  Device 
(ACD)  mask. 


4.2.1  DPI  Adder  Array 

The  first  digital  CCD  adders  were  designed  on  the  DPI  mask  in  November 
1975.  There  were  two  versions  of  4  +  4  bit  adders  on  the  DPI  mask.  One 
adder  was  designed  from  ten  half-adders  as  shown  in  Figure  4-5  and  the  other 
designed  from  three  full-adders  and  a  single  half-adder  as  shown  in  Figure  4-6. 


The  addition  of  two  4-bit  binary  numbers  aQ ,  a1  ,  a 2 ,  a^,  and  bo>  b-j  ,  b^, 
b^  in  which  aQ  and  bQ  are  the  least  significant  bits,  is  performed  with  DCCL 
in  the  following  manner; 


First  Word 
Second  Word 
Carry  Bits 
SUM 

(Carry  bit  cn 


a3  a2  al  ao 
b3  b2  bl  bo 


is  generated  by  column  n-I.) 


4-4 


Testing  of  the  4-Bi t  adder  designed  exclusively  from  half-adders,  two 
problems  emerged  that  made  a  successful  demonstration  very  difficult.  The 
first  problem  was  the  large  transfer  loss  Incurred  through  the  three  to  four 
cascaded  half-adders  associated  with  the  most  significant  bits.  This  loss 
resulted  In  very  small  sum  and  carry  charge  packets  that  were  too  small  to 
switch  the  charge  transfer  electrodes.  The  second  problem  was  the  accumulation 
of  a  very  large  fat-zero  where  the  five  most  significant  carry  channels 
were  "OR'd." 

It  also  proved  difficult  to  exercise  this  adder  because  of  the  different 
number  of  CCD  transfers  in  each  column. 

None  of  these  problems  occured  In  the  4+4  bit  adder  array  that  was 
designed  using  full-adders.  This  array  was  demonstrated  successfully  in 
February  1976  and  the  test  results  were  published/6^  the  following  September. 

Testing  was  initially  carried  out  at  room  temperature  (25°C)  and  at  a 
clock  frequency  of  10  kHz.  The  clock  frequency  was  divided  down  by  16  to 
produce  a  625  Hz  word  rate  so  that  only  one  output  word  was  displayed  on  the 
monitoring  CRT  at  one  time.  By  using  this  technique  we  could  check  that  the 
pipeline  delay  shift  register  stages  were  functioning  correctly  (all  output 
bits  should  be  coincidental  in  time).  The  photograph  of  Figure  4-7  shows  the 
4  most  significant  bits  of  the  output  sum  when  input  word-a  is  1110  and  input 
word-b  is  0000. 

Output  patterns  for  different  input  conditions  were  photographed  at  clock 
frequencies  of  100  kHz  and  175  kHz.  The  logic  "1"  output  levels  do  not  attenuate 
significantly  as  the  frequency  is  increased;  however,  they  become  obscured  as 
the  logic  "0"  output  levels  (fat-zero)  grow  larger.  By  switching  the  b^  input 
and  observing  the  output  patterns  on  the  oscil loscope,  it  was  observed  that  the 
adder  array  performed  the  correct  arithmetic  functions  up  to  200  kHz,  but  with 
a  deteriorated  signal -to-noi se  level. 

The  operating  temperature  range  was  determined  by  functional  testing  in 
a  temperature  controlled  chamber.  The  clock  voltages  were  adjusted  at  a 
frequency  of  11  kHz  and  at  25°C  so  that  the  3  most  significant  output  bits 
from  the  adder  array  were  performing  correctly  for  each  input  combination  and 
with  a  maximum  signal -to-noi se  ratio.  The  temperature  was  increased  in  10° 


[Not  shown,  S-j  =  0] 


Output  bit  =  1 

Output  bit  $3  =  1 

Output  bit  =  1 

Output  bit  S5  =  0 
(Decimal  28) 

Figure  4-7.  Output  Signals  from  the  2-word,  4-bit  Adder  Array 
When  a-word  =  1110  and  the  b-word  =  0000. 

increments  while  the  inputs  were  switched  and  the  outputs  monitored.  At  65°C, 
the  MSB  of  the  array  ceased  to  function.  The  S^  sum-bit  output  remained  at 
"0",  and  the  S^  carry-bit  output  remained  at  "1".  The  S3  sum-bit  output 
continued  to  function  correctly.  At  110°C  the  fat-zero  level  of  all  outputs 
had  increased  sufficiently  such  that  no  bits  were  operable.  It  should  be 
noted  that  the  combination  of  high  temperature  and  low  frequency  is  the  most 
difficult  operating  condition  from  the  standpoint  of  thermal  leakage. 

Proper  operation  at  125°C  could  be  assured  simply  by  operating  the  existing 
device  at  a  frequency  above  500  kHz. 

The  temperature  was  then  reduced  to  25°C  and  all  outputs  resumed  operating 
correctly.  The  temperature  was  then  lowered  in  10°  steps,  the  inputs  switched 
and  again  the  3  most  significant  output  bits  monitored.  At  -15°C  the  fat-zero 
level  was  reduced,  but  no  further  change  in  arithmetic  performance.  The 
temperature  was  then  reduced  to  -65°C  and  the  C2  control  line  adjusted  so  that 
all  channels  performed  correctly  with  a  maximum  si gnal -to-noi se  ratio. 
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4.2.2  The  DP2  4+4  Bit  Adder  Array 

The  DP2  adder  arrays  Incorporated  several  significant  improvements  over 
the  DPI  designs.  One  of  these  was  the  change  from  metal-polysilicon  gates 

to  double  polysilicon  gates.  This  change  provided  us  with  three  levels  of 

interconnect,  metal  and  two  polysllicon  layers  insulated  from  each  other 
by  a  layer  of  oxide.  Advantage  was  taken  of  the  additional  interconnecting 
level  by  increasing  the  density  of  the  layout,  resulting  in  the  ability  to 
directly  interconnect  full-adder  or  half-adder  cells  without  any  intercell 
delay  stages.  Another  improvement  was  achieved  by  changing  the  CARRY  out  port 
from  the  master  node  of  the  charge  transfer  electrode  to  the  storage  area 
now  referred  to  as  the  T-gate.  This  change  provided  the  capability  of 
obtaining  a  fully  restored  CARRY  output  charge  packet. 

There  were  three  adder  arrays  on  the  DP2  chip;  a  4  +  4  bit  that  used 

cascaded  half-adders,  8+8  bit  that  used  cascaded  half-adders  and  a  8  +  8 

bit  that  used  full-adders.  Both  of  the  arrays  using  cascaded  half-adders  also 
used  the  fully  restored  CARRY  output  packets.  However,  the  adder  designed  with 
full-adders  used  the  concepts  that  were  employed  previously  on  the  DPI.  The 
reason  for  retaining  the  original  full  adder  array  design  was  based  on  the 
success  of  the  4+4  bit  adder  on  DPI.  A  block  diagram  of  the  DP2,  4+4  bit 
adder  array  using  the  dual  cascaded  half-adders  is  shown  in  Figure  4-8.  A 
photograph  of  a  processed  array  is  shown  In  Figure  4-9.  This  4+4  bit 
adder  array  consists  of  seven  half-adders,  three  OR-gates,  fifteen  single-bit 
shift-registers  and  five  output  buffers.  The  4+4  bit  array  processes  the 
data  in  parallel,  thus  the  two  4-bit  numbers  are  applied  synchronously  and  the 
outputs  are  also  available  synchronously  following  a  pipeline  delay  of  four 
clock  periods. 

Testing  of  the  DP2  4  bit  adder  array  verified  full  functional  operation. 

The  eight  input  lines  of  the  adder  array  were  exercised  through  all  possible 
sixteen  combinations  and  the  output  from  the  array  was  observed  on  an  oscillo¬ 
scope.  It  was  observed  that  the  five  array  lines  produced  the  correct  output 
data  for  each  input  combination. 

Photographs  of  the  4+4  adder  array  outputs  for  various  input  data  combina¬ 
tions  are  shown  in  Figures  4-10,  4-11,  4-12,  4-13. 
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4.2.3  The  DP2  8  +  8  Bit  Adder  Arrays 
4.2. 3.1  The  Cascaded  Dual  Half  Adder  Design 

In  this  adder  array  the  arithmetic  is  performed  using  cascaded  dual 
half-adders.  This  8+8  bit  adder  array  is  an  extension  of  the  4  +  4  bit 
array  discussed  in  Section  4.2.2.  The  addition  of  two  8-bit  binary  numbers 
aQ  -  a7  and  bQ  -  b^  is  performed  with  DCCL  in  the  same  manner  as  described 
for  the  4  +  4  bit  array. 


First  Word  a?  ag  ag  a4  a3  a?  a1  aQ 

Second  Word  b j  bg  bg  b^  b3  b2  bj  bQ 

Carry  Bits  cg  Cj  cg  cg  c4  c3  c2  c1 


A  block  diagram  of  this  8+8  bit  adder  array  is  shown  in  Figure  4-14.  The 
array  utilizes  fifteen  half-adders,  seven  OR  gates,  seventy-seven  bits  of 
shift  register  and  eight  output  buffers.  There  is  a  propagation  delay  of 
eight  clock  periods  through  this  version  of  the  8+8  bit  array.  A  photo¬ 
graph  of  a  processed  array  is  shown  in  Figure  4-15. 

The  8+8  bit  adder  arrays  were  operated  with  the  same  clock  voltages 
used  for  the  4+4  bit  arrays  described  previously.  All  nine  output  channels 
responded  correctly  to  the  input  signals  as  verified  by  the  oscilloscope 
photographs  of  Figures  4-16,  4-17,  4-18,  and  4-19.  In  these  figures,  a.  and 
b,.  are  the  input  bits,  where  aQ  and  bQ  represent  the  least  significant  bits 
and  represents  the  array  outputs. 

4.2. 3. 2  Full  Adder  Design 

The  8  +  8  bit  full  adder  version  of  the  array  perfonns  tne  addition  of  two 
8-bit  numbers  in  the  same  manner  as  described  in  Section  4. 2. 3.1.  A  block 
diagram  of  this  adder  array  is  shown  in  Figure  4-20.  This  array  utilizes 
one  half-adder,  seven  full -adders,  eighty-four  single  bits  of  shift  register  L: 

and  eight  output  buffers.  There  is  a  propagation  delay  of  eight  clock  phases 
through  this  version  of  the  8+8  bit  adder  array.  A  photograph  of  a 
processed  array  is  shown  in  Figure  4-21. 
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Utilizing  Cascaded  Dual  Half-Adders 
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Several  mask  errors  were  discovered  on  the  full -adder  version  which 
prevented  it  from  functioning.  Since  the  DP3  mask  set  (using  the  cascaded 
dual  half  adder  approach)  was  soon  to  be  completed,  we  decided  not  to 
procure  a  corrected  version  of  the  DP2  mask  set. 

4.2.4  The  DP3  Adder  Array 

The  0P2  and  DP3  designs  differ  in  both  geometry  and  wafer  processing. 

With  respect  to  geometry,  the  DP2  design  utilized  a  7.6  y  minimum  geometry 
whereas  the  DP3  design  utilized  a  5.1  y  geometry. 

The  DP3  16  +  16  bit  adder  array  performs  the  addition  of  two  16-bit  binary 
numbers  in  the  same  manner  described  for  the  8+8  bit  adder  array  described 
in  Section  4. 2. 3.1.  A  block  diagram  of  the  16  +  16  bit  adder  array  is  shown 
in  Figure  4-22.  The  full  adder  blocks  shown  in  Figure  4-22  are  composed  of  dual 
cascaded  half  adders  and  an  OR  gate  as  described  in  Section  3.3.  A  change  was 
made  to  the  basic  design  by  utilizing  a  full  adder  for  the  least  significant 
bit  (LSB)  rather  than  a  half-adder.  The  additional  input  allowed  us  to  cascade 
arrays  up  to  any  number  of  bit  length  words  by  using  the  "carry-in"  feature  on 
the  LSB. 

The  packets  of  charge  propagating  through  this  array  undergo  seventeen 
transfers.  Due  to  the  low  transfer  efficiencies  observed  on  the  DP2  arithmetic 
arrays,  two  levels  of  charge  refresh  cells  were  incorporated  in  the  16  +  16 
bit  DP3  array. 

The  first  level  of  charge  refresh  was  inserted  after  nine  or  ten  transfers 
and  the  second  level  after  the  seventeenth  transfer  (immediately  before  the 
charge  packet  is  converted  to  a  voltage  signal  by  the  output  buffer).  The  16  +  16 
bit  array  utilizes  thirty-two  half-adders,  sixteen  OR  gates,  forty  charge 
refresh  cells,  three  hundred  and  forty-one  single  stage  shift-register  bits  and 
seventeen  output  buffers. 

There  is  a  pipeline  delay  of  seventeen  clock  periods  through  the  16  +  16 
bit  adder  array.  A  photograph  of  a  processed  array  is  shown  in  Figure  4-23. 
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re  4-23.  The  DP3  16  +  16  Adder  Array. 


In  an  effort  to  reduce  the  number  of  clock  lines  required  by  the  adder 
designs,  analytical  calculations  and  empirical  testing  on  DP2  single  half¬ 
adder  test  cells  Indicated  that  the  dual  half-adder  would  function  correctly 
with  only  five  clock  lines  (plus  the  inject  diode  and  the  PR  reset  clock). 

The  DP3,  16  +  16  bit  adder  array  design  was  based  on  this  five  clock  scheme. 
Unfortunately,  due  to  several  gates  being  tied  to  the  same  clock  line,  race 
conditions  were  encountered  which  prevented  meaningful  evaluation  of  the 
16  +  16  bit  adder  array. 

4.2.5  The  DP5,  Adder/Subtractor  Array 

In  July  1977,  work  began  on  the  design  of  a  32  +  32  bit  adder  array 
designated  as  DP5.  At  the  same  time,  the  requirement  for  clock  speeds  of 
5  MHz  and  above  developed  in  response  to  near  term  voice  processing  applications 
since  a  maximum  clock  speed  of  2  MHz  was  projected  for  our  current  P-channel 
designs,  a  decision  was  made  to  pursue  an  n-channel  design  for  the  32  +  32 
bit  adder. 

A  n-channel  process  was  devised,  design  rules  drawn  up  and  a  n-channel 
evaluation  mask  set,  designated  NE-1 ,  designed.  The  successful  demonstration 
of  n-channel  full -adders  and  half-adders  on  the  NE-1  mask,  encouraged  the 
continuation  of  the  n-channel  32  +  32  bit  adder  array. 

In  an  attempt  to  satisfy  the  needs  of  a  near  term  voice  processing 
application,  it  was  decided  that  the  DP5  array  should  be  able  to  add,  subtract 
and  perform  the  exclusive-OR  function.  This  array  was  to  have  three  control 
lines  (J,  K,  and  D)  and  perform  eight  different  functions  as  shown  in  Table  4-1. 

Table  4-1.  Selectable  Functions  of  the  DP5  Array. 

J  K  D 

1  0  0  (A+B)  Add 

1  1  1  (A-B)  Subtract 

1  0  1  (A+B+l)  Increment 

1  1  0  (A+B)  Add  complement 

0  0  0  (A©B)  Exclusive-OR 

0  1  0  (A©  B)  Equivalence 

0  0  1  Not  required  in  this  application 

0  1  1  Not  required  in  this  application 


In  pipeline  DCCL  arithmetic,  where  it  is  required  to  begin  processing  a 
new  word  each  successive  clock  cycle,  the  2's  complement  number  system  is 
ideal  for  addition  and  subtraction.  For  this  reason,  2’s  complement  was 
chosen  as  the  number  system  for  the  DP5  32-bit  array. 

The  following  paragraphs  provide  a  description  of  the  signal  flow  and 
control  line  functions  of  the  final  DP5  array  design.  Two  32-bit  2's  complement 
numbers  synchronously  enter  the  array  and  are  appropriately  skewed  by  CCD 
shift  registers.  One  of  the  input  words  (the  subtrahend  in  the  case  of  the 
subtract  mode)  is  transferred  into  one  of  the  two  input  ports  of  each  of  the 
32  exclusive-OR  gates. 

The  control  line,  K,  determines  the  binary  value  of  the  input  to  the 
other  input  port  of  the  exclusive-OR  gates  as  shown  in  Figure  4-24.  When 
control  like  K  is  switched  so  that  a  binary  zero  is  entered  into  the  exclusive- 
OR  gates,  the  data  is  transferred  through  the  exclusive-OR  without  being 
changed.  However,  when  the  control  line  K  is  switched  so  that  a  binary  one  is 
entered  into  the  exclusive-OR  gates,  the  data  is  complemented.  The  control 
line  K,  is  also  transferred  into  one  input  terminal  of  a  half-adder  and  a 
binary  one  input  packet  is  transferred  into  the  other  half-adder  input  during 
each  clock  period.  The  half-adder  will  generate  a  binary-one  charge  packet 
from  its  carry-out  port  each  clock  period  when  K  =  1  and  a  binary-zero  charge 
packet  from  the  carry-out  port  when  K  =  0.  The  carry-out  port  of  the  half¬ 
adder  is  connected  to  the  carry-in  port  of  the  least-significant-bit  full-adder 
in  the  array.  Thus  when  K  =  1,  the  input  data  (the  subtrahend)  is  complemented 
and  the  difference  is  incremented  by  one. 

The  output  from  the  32  exclusive-OR  gates  and  the  other  32-bit  input  data 
are  transferred  into  two  of  the  three  inputs  of  each  of  the  32  full -adders 
as  shown  in  Figure  4-25.  The  carry-out  port  of  each  full-adder  is  connected 
to  the  carry-in  port  of  the  next  more  significant  full-adder.  Each  pair  of 
bits  are  added  (or  subtracted)  in  turn,  at  each  subsequent  clock  phase.  There¬ 
fore  the  most  significant  pair  are  acted  upon,  32  clock  phases  after  the  least 
significant  pair. 
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Figure  4-25.  Block  diagram  of  the  DP5  adder/subtractor  exclusive-OR  array. 


In  order  to  be  able  to  switch  the  function  of  the  array  from  add/subtract 
to  exclusive-OR,  two  input  AND  gates  are  inserted  between  the  carry-out  and 
carry-in  ports  of  each  full-adder.  A  control  line  S  controls  the  value  of 
the  binary  input  to  the  other  input  port  of  the  AND  gates.  When  S  is  at 
the  binary  zero  value,  the  carry  channel  is  inhibited,  and  the  array  output 
is  the  exclusive-OR.  When  S  is  at  the  binary-one  value,  the  carry-bits  are 
enabled  and  the  array  output  is  the  add/subtract.  There  are  no  clock  phase 
delays  through  a  DCCL  AND  gate,  so  this  extra  functional  capability  does  not 
require  more  skewing. 

Layout  of  the  DP5  array  was  completed  in  September  1978  and  a  device  lot 
was  fabricated.  Due  to  the  size  and  complexity  of  the  DP5  array,  many  gates, 
and  contacts  were  later  found  to  be  missing  from  the  layout  making  it  impossible 
to  test  the  array.  Since  the  original  voice  processing  application  for  this 
32-bit  DP5  array  had  disappeared  by  this  time,  no  corrections  to  the  DP5  masks 
were  pursued.  A  photograph  of  the  DP5  chip  is  shown  in  Figure  4-26. 

4.2.6  Azimuth  Correlator  Device  (ACD)  10  +  10  Bit  Adder 

The  10  +  10  bit  adder,  required  for  the  ACD  project,  utilized  dual 
cascaded  half-adders  to  perform  the  full-adder  function.  The  design  of  this 
adder  was  nearly  identical  to  the  DP2  8-bit  adder,  with  the  exception  that  no 
skewing  or  de-skewing  delays  were  included.  The  first  version  of  this  adder 
was  incorporated  onto  the  ACDO  test  mask. 

The  outputs  on  the  ACDO  10  bit  array  were  transfered  into  charge-comparator 

latches.  Due  to  design  problems  in  these  comparator  circuits,  it  was  impossible 

to  get  two  of  them  to  function  at  the  same  bias  voltages  which  made  it 
impossible  to  get  meaningful  test  data  on  the  adder  array. 

The  identical  adder  was  placed  on  the  revised  (ACD2)  mask  with  the  trouble¬ 
some  charge-comparator  latch  circuit  replaced  with  NM0S  analog  source 

follower  output  buffers. 
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Sufficient  testing  of  the  10  +  10  bit  adder  was  performed  to  verify  that 
the  layout  was  correct  and  that  the  carry  charge  packet  would  propagate  through 
all  ten  stages.  The  ACD2  10-bit  adder  consists  of  10  full-adders  formed 
from  dual  cascaded  half-adders  as  shown  in  Figure  4-27. 

Functional  testing  was  performed  by  making  one  input  word  all  "Vs,  the 
second  input  word  all  "0" 's,  and  varying  the  input  carry-bit  between  a  "1" 
and  a  "0".  Setting  the  input  carry-bit  to  a  "1"  results  in  the  output  word's 
most  significant  bit  becoming  a  "1"  and  the  other  bits  a  "0",  while  setting  the 
input  carry  bit  to  a  "0"  results  in  the  output  word's  most  significant  bit 
becoming  a  "0"  and  the  other  bits  a  "1".  This  data  pattern  exercises  the 
full  scale  of  the  adder.  This  result  is  graphically  seen  in  the  photographs 
of  Figure  4-28.  The  top  trace  of  the  photograph  is  the  input  carry  bit 
alternating  between  a  "0"  (low  level)  for  14  cycles  and  a  "1"  (high  level)  for 
two  cycles.  Seven  cycles  later  the  least  significant  bit  (SI)  of  the  output 
word  switches  between  a  "1"  and  a  "0"  in  response  to  the  input  signals.  The 
output  of  each  successive  bit  is  delayed  by  1  cycle  to  account  for  the 
propagation  of  the  carry  bit  through  the  adder,  until  17  cycles  after  the 
initial  input  signal,  the  two  most  significant  bits  (S10  and  Sll )  of  the  output 
word  switch  to  their  correct  positions.  Thus  the  10  bit  adder  is  demonstrated 
as  working  successfully.  Tests  were  performed  over  a  frequency  range  of 
50  kHz  to  the  1.5  MHz  upper  limit  imposed  by  the  probe  station  and  available 
test  equipment.  A  photograph  of  the  10-Bit  +  10-Bit  adder  is  shown  in  Figure 
4-29. 

4.3  EVOLUTION  OF  MULTIPLIERS 

The  following  sections  describe  the  evolution  of  the  DCCD  multiplier 
development  effort  from  the  first  DPI  designs  to  the  latest  ACD  designs. 

4.3.1  The  DPI  Multiplier  Arrays 

All  of  the  multipliers  described  in  this  section  are  parallel  multipliers 
that  operate  in  the  manner  described^  by  C.  S.  Wallace. 

The  DCCD  operations  used  to  multiply  two  3-bit  binary  numbers  are 
performed  in  the  following  manner. 
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Figure  4-27.  ACD2  10-Bit  +  10-Bit  Adder  Array. 


a3b2 
a2b3 
a3°3  a2b3 
P6  P5  P4  P3  P2  P] 

The  nine  summands  must  be  formed  with  logic  AND  gates  and  then  added 
by  columns  (with  carries)  to  form  the  product  pg - P2,  p-j . 

The  DPI  3x3  bit  multipliers,  designed  in  November  1975,  were  p- 
channel ,  metal  polysilicon  gate,  7.6  p  geometry  devices.  A  block  diagram 
of  a  3  x  3  bit  multiplier  designed  using  AND  gates  and  half-adders  is  shown 
in  Figure  4-30.  Also,  a  block  diagram  of  the  same  multiplier  implemented 
using  AND  gates  and  a  mixture  of  half-adders  and  full-adders  is  shown  in 
Figure  4-31.  Photographs  of  these  multipliers  are  shown  in  Figure  4-32 
(half  adder  version)  and  Figure  4-33  (half  adder  and  full  adder  mixture). 

In  February  1976  testing  began  on  the  3x3  bit  multiplier  designed  from 
a  mixture  of  half-adders  and  full-adders.  Two  mask  errors  were  found  which 
limited  the  operation  of  the  multiplier  array.  One  error  was  a  very  narrow 
break  in  the  metal  connection  to  an  input  injection  diode  which  prevented 
the  a^  input  (LSB)  from  operating.  The  second  error  was  at  missing  contact 
to  a  polysilicon  gate  in  the  sum  channel  of  the  most  significant  full  adder. 
However,  since  the  most  significant  bit  carry  output  functioned  correctly  it 
was  possible  to  infer  proper  logic  operation  of  all  cells  in  the  array. 

Correct  logic  operation  of  the  entire  multiplier  array  was  demonstrated  by 
keeping  the  a-j  input  constant  at  binary  "0"  and  exercising  the  other  five 
inputs  through  all  possible  combinations. 

There  are  six  parallel  outputs  from  the  multiplier  array,  so  that 
obtaining  a  meaningful  photograph  showing  simultaneous  outputs  is  quite 
difficult.  For  simple  input  combinations  where  only  four  outputs  are  required. 
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Figure  4-30.  3x3  Bit  Multiplier  using  half  adders 

and  "or"  circuits. 
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Figure  4-32.  Half  Adders. 


Figure  4-33.  Full  and  Half  Adders. 


the  output  pulses  wave  forms  are  very  similar  to  those  shown  previously 
for  adder  arrays.  However,  when  all  logic  cells  are  operational  and 
charges  have  to  propagate  through  the  MSB  chain  of  full  adder  cells,  there 
is  some  deterioration  of  the  charge  and  a  noticeable  difference  in  the  output 
pulse  amplitudes  depending  upon  bit  position.  Nevertheless,  complete 

functional  operational  was  achieved.  Full,  test  results  on  this  multiplier 

(6) 

were  published,  in  September  1976. 

The  3  x  3  bit  multiplier  designed  entirely  from  half-adders  was  never 
successfully  demonstrated.  This  was  due  to  the  additional  number  of  charge 
transfers  required  by  this  approach,  the  large  number  of  channels  OR'd  for 
the  most  significant  bit,  and  some  design  errors  discovered  in  the  deskewing 
logic  clocking  scheme. 

4.3.2  The  DP2  Multiplier  Arrays 

In  April  1976  we  started  on  the  design  of  the  DP2  test  matrix.  In  this 
design,  a  change  from  a  metal -polysil icon  gate  structure  to  a  double-polysilicon 
structure  was  made. 

In  a  similar  manner  as  for  the  DPI  designs,  the  DP2  mask  contained  two 
versions  of  3  x  3  bit  multiplier.  One  multiplier  employed  cascaded  dual 
half-adders  as  shown  in  Figure  4-34  had  contained  automatic  carry  refresh 
capabilities.  The  other  multiplier,  designed  from  full-adders  and  half¬ 
adders  as  shown  in  Figure  4-35,  was  designed  without  the  refresh  capability. 
Figure  4-34  shows  that  the  generation  of  p-|  (LSB)  only  requires  an  AND  gate, 
the  generation  of  p^  requires  a  half-adder,  and  the  generation  of  p^  and  p^, 
both  require  three  half-adders.  This  version  of  the  3x3  bit  array  utilizes 
nine  AND  gates,  nine  half-adders,  three  OR  gates,  seventeen  1-bit  shift 
registers  and  six  output  buffers.  There  is  a  pipeline  delay  of  four  clock 
phases  through  this  3  x  3  bit  multiplier.  A  photograph  of  the  DP2  3x3 
bit  multiplier  designed  from  cascaded  half-adders  is  shown  in  Figure  4-36. 

Testing  performed  on  the  cascaded  half  adder  version  of  the  DP2  3x3 
bit  multiplier  verified  correct  operation  for  all  combinations  of  input 
states.  Oscilloscope  photographs  of  the  multiplier  output  for  several  different 
input  data  combinations  are  shown  in  Figures  4-37,  4-38,  4-39,  4-40,  4-41  and 
4-42.  In  these  figures,  a.  and  b.  represent  the  input  bits  and  p^  represents 
the  bits  of  the  output  product. 
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3  bit  multiplier  utilizing  half  adder. 


Figure  4-37.  Input  to  the  2-Word,  3-Bit  Multiplier 
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Figure  4-38. 


Input  to  the  2-Word,  3-Bit  Multiplier 
aj  =  0  1  0  ) 
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The  multiplier  designed  from  a  mixture  of  single  binary  level  full-adders 
and  half-adders  was  never  successfully  demonstrated. 

4.3.3  The  DP3  Multiplier  Array 

The  design  and  layout  of  a  8  x  8  bit  multiplier  for  che  DP3  test  mask 
was  begun  in  July  1976.  The  8x8  multiplier  requires  many  more  operations 
than  its  DP2  3x3  bit  predecessor  as  shown  in  Table  4-2. 

The  64  summands  are  formed  by  54  AND  gates  and  then  added  by  columns 

(with  carries)  to  form  the  products  p^  -  P,,  p^ .  Note:  for  example, 

that  if  all  inputs  are  a  binary  "1"  then  two  carries  are  formed  in  the  gen¬ 
eration  of  p  ?  so  that  a  total  of  six  entries  must  be  added  to  generate  p„ . 

This  increase  in  entries  continues  until  we  reach  p^;  this  column  has  seven 
summands  plus  four  carries  resulting  in  a  total  of  eleven  entries.  Thus, 
column  pg  requires  five  cascaded  half-adders  or  ten  half-adders  for  imple¬ 
mentation  . 

A  block  diagram  of  the  DP3  8  x  8  bit  multiplier  is  shown  in  Figure  4-43. 
This  array  requires  64  AND  gates.  111  half-adders,  48  OR  gates,  154  charge 
refresh  cells,  466  single-bit  shift-registers  and  16  output  buffers.  This 
8  x  8  bit  multiplier  exhibits  a  pipeline  delay  of  32  clock  periods.  A  photo¬ 
graph  of  the  DPJ  8x8  bit  multiplier  is  shown  in  Figure  4-44. 

Although  the  carry  output  charge  packet  of  the  DP3  full -adder  design 
contained  the  automatic  refresh  feature,  the  sum  outputs  remain  attenuated 
by  transfer  losses.  In  the  longest  column  of  the  DP3  multiplier  array,  a 
sum  charge  packet  bit  propagates  through  seven  adders  with  each  full -adder 
requiring  four  transfers.  Prior  test  results  from  the  DP-1  arrays  showed 
that  when  a  charge  packet  is  reduced  by  more  than  eight  transfers,  it  causes 
the  ari thmet ic  functions  to  become  unreliable.  Based  on  this  data,  the 
decision  was  made  to  restore  the  sum  output  charge  packet  after  two  levels  of 
binary  addition  and  restore  the  sum  bit  three  times  in  each  column. 
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Figure  4-43.  Schematic  diagram  of  the  8-bit  x  8-bit  multiplier. 
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A  half-adder  cell  was  modified  to  form  a  charge  refresh  cell  by  connecting 
one  input  to  a  constant  logic  "1"  value.  When  a  degraded  (from  transfer  loss 
effects)  charge  packet  is  transfered  into  the  other  half-adder  input,  the  carry 
out  port  of  the  half  adder  provide  a  fully  restored  charge  level.  This 
simple  charge  restoring  cell  measures  5.4  x  2.1  mils.  In  this  version  of 
the  restoring  cell,  the  usual  half-adder  sum  output  is  connected  to  a  sink  diode. 
However,  in  more  complex  layout,  the  sum  output  can  be  taken  to  an  output  FET 
source-follower  to  provide  a  "A-out"  test  point.  This  logic  cell  measures 
7.9  x  3.9  mils  and  introduces  two  extra  transfers.  It  was  used  in  the  array 
after  four  levels  of  addition  where  the  column  density  is  reduced  from  43  at 
the  first  level  of  restoration  to  29  at  the  second  level. 

Since  the  design  of  the  DP3  8x8  bit  array  used  the  same  logic  cells  as 
the  16  +  16  bit  adder  previously  described  in  Section  4.2.4,  it  also  had  the 
identical  race  condition  and  was  untestable.  Due  to  the  tight  development 
schedule,  the  decision  was  made  not  to  correct  the  design  in  order  to  proceed 
with  the  design  of  the  DP4  test  mask. 

4.3.4  The  DP4  Multiplier  Array 

The  design  of  the  DP4  16  x  16  multiplier  began  in  May  1977.  The  DP4 
multiplier  was  a  3-phase,  n-channel  ,  double-polysilicon  overlapping  gate 
structure.  Up  until  this  design,  all  previous  DP  series  multipliers  had  been 
2-phase,  p-channel.  The  complexity  of  the  61  x  16  bit  multiplier  can  be  seen 
from  Table  4-3. 

A  considerable  amount  of  time  was  spent  in  digitizing  large  standard 
building  blocks  from  four  levels  of  full-adders  and  attempting  to  interconnect 
the  cells  without  adding  extra  pipeline  delays.  A  refresh  cell  imposes  two 
additional  pipeline  delays  on  the  throughput  of  the  multiplier.  It  was  cal¬ 
culated  that,  at  the  system  speed  requirements  of  5  MHz  clock  rate,  we  would 
have  to  refresh  the  SUM  channels  between  each  level  of  4  cascaded  full-adders. 
Thus,  there  would  be  six  levels  of  refresh  required  corresponding  to  a  total 
pipeline  delay  of  thirty  nine  clock  periods. 

Layout  of  the  cells  and  interconnection  of  the  DP4  16-bit  x  16-bit 
multiplier  continued  until  September  1977  when  it  became  evident  that  the  DP4 
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Table  4-3.  Algorithm  for  a  16-bit  x  16-bit  multiplier. 
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design  was  much  more  complex  than  originally  envisioned.  Work  on  DP4  mask 
was  stopped  when  It  was  concluded  that  the  probability  of  successfully 
completing  the  layout  within  the  program  cost  and  schedule  constraints  was 
very  small . 

In  retrospect  It  appears  that  although  the  Wallace  algorithm  as  shown 
In  Tables  4-1  and  4-3  may  be  suitable  for  implementation  In  digital  CCD 
logic  up  to  8-bits  x  8-bits  it  becomes  too  complex  to  implement  for  larger 
numbers.  Possibly  the  algorithm  proposed, ^  by  Robert  Logan,  In  which  high¬ 
speed  multiplication  by  summing  squares  is  described  could  be  more  simply 
Implemented. 

4.3.5  The  Azimuth  Correlator  Device  (ACD)  Multiplier  Array 

A  complex  multiplication  was  required  for  the  ACD  project  (described  In 
Section  9.3).  This  requirement  was  implemented  with  4  separate,  but  identical, 
multipliers.  All  of  the  previous  multiplier  designs  performed  a  magnitude 
only  product.  The  ACD  system  required  a  6  x  4  bit  2 ' s  complement  multiplier. 
The  algorithm  for  performing  this  multiplication  is  shown  in  Table  4-4. 

Table  4-4.  Multiplication  of  6  x  4,  2 ‘ s  Complement  Numbers 
Producing  a  Positive  Product 


The  3-phase,  n-channel  half-adder  designed  for  the  NE-1  chip  was  modified 
for  4-phase  clocks  and  the  transfer  lengths  of  the  gates  reduced  to  meet  the 
3.5  MHz  AGO  clock  speed  requirement. 

The  ACDO  (first  test  mask)  multiplier  was  implemented  entirely  from  half¬ 
adders  in  a  nearly  identical  manner  to  the  3x3  bit  multiplier  described  in 
Section  4.3.1.  The  single  half-adder  was  chosen  as  the  basic  building  block 
of  the  multiplier  because  of  the  speed  with  which  the  cell  could  be  step-and- 
repeated  into  an  array  in  the  CAD  system.  In  practice  we  found  that  the 
attendent  custom  skewing  delay  layout  required  more  time  and  effort  than  that 
saved  by  the  cell  step  and  repeat  procedure. 

A  schematic  diagram  of  the  ACDO  6x4  bit  multiplier  is  shown  in  Figure 
4-45.  As  seen  from  this  diagram,  there  are  four  refresh  levels  required  in 
this  multiplier  and  a  pipeline  delay  of  twenty-six  clock  periods  for  the  most 
significant  output  product.  This  multiplier  uses  21  AND  gates,  37  charge 
refresh  cells,  85  half-adders,  and  measures  100  x  82  mils.  Deskewing  of 
the  output  product  bits  was  not  performed.  The  output  products  are  trans- 
fered  directly  into  charge-comparator  latches.  Design  problems  in  these 
charge  comparator  amplifiers  made  it  impossible  to  get  two  of  them  to  function 
from  the  same  voltage  clock  lines.  This  made  it  impossible  to  meaningfully 
test  the  ACDO  multiplier. 

4.3.6  The  ACD2  Multiplier  Array 

The  revised  version  (ACD2)  of  the  6-bit  x  4-bit  2's  complement  multiplier 
utilized  dual  cascaded  half-adders  as  shown  in  the  schematic  diagram  of  Figure 
4-46.  As  can  be  seen  from  this  diagram,  only  one  level  of  charge  refresh  is 
required  and  only  10  pipeline  delays  are  required  for  the  most  significant 
product.  This  multiplier  contains  21  AND  gates,  6  charge  refresh  cells, 

35  half-adders  and  measures  54  x  52  mils  and  requires  only  one  third  the 
area  of  the  ACDO  multiplier.  Deskewing  of  the  product  was  still  not  performed. 
However,  to  avoid  the  ACDO  charge  comparator  amplifier  problem,  the  output 
product  charge  paskets  were  transfered  into  NMOS  analog  source  follower  output 
buffers.  This  multiplier  was  tested  in  September  1979  and  performed  correctly 
over  the  frequency  range  50  kHz  to  500  kHz.  Higher  frequency  operation  is 
expected  from  packaged  devices.  A  photograph  of  the  ACD2  6x4  bit  multiplier 
is  shown  in  Figure  4-47. 
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igure  4-47-  ACD2  6  x  4  Multiplier. 
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4.4  FAST  HADAMARD  TRANSFORM  (FHT)  CELL 

An  interconnect  diagram  of  the  full  up  FHT  chip  (FHT-1),  is  shown  in 
Figure  4-48.  The  blocks  labeled  Al-12  through  F  1-12  are  the  Hadamard  func¬ 
tional  blocks.  All  A  blocks  are  identical  as  is  the  case  with  the  B,  C,  D, 

E,  and  F  blocks.  The  interconnect  within  the  A-block  is  shown  in  Figure  4-49 
along  with  a  listing  of  the  fundamental  circuits  comprising  the  block.  Note 
that  the  following  four  basic  circuits  make  up  the  A-FHT  block;  shift  register, 
half  adder,  charge  fan-out  (CFO)  and  charge  transfer  node  (CTN).  A  more 
complete  description  of  the  CTN  is  provided  later  in  this  section.  The  CFO 
is  a  device  similar  in  structure  to  the  half  adder  but  different  in  function 
in. that  it  generates  two  logical  replicas  from  a  single  logic  input.  Figure 
4-50  shows  a  signal  timing  diagram  for  the  A1  block.  This  is  the  simplest 
as  it  produces  the  sums  and  differences  of  the  input  data,  i .e.  ,  a  +  b,  a  -  b, 
c  +  d,  e  -  d,  etc.  The  B  blocks  take  this  information  and  produce  more  complex 
results,  (a  +  b)  +  (c  +  d),  (a  +  b)  -  (c  +  d),  (a  -  b)  +  (c  -  d),  (a  -  b)  - 
(c  -  d),  etc.  This  sequence  of  sums  and  differences  is  controlled  by  the 
distributed  multiplexer,  denoted  C1B(8 00,  in  Figure  4-49.  This  is  called 
distributed  because  it  appears  in  each  functional  block.  The  multiplexing 
is  done  with  signal  charge  allowing  it  to  propagate  through  the  array 
synchronously  with  the  data  that  it  controls. 

The  CFO  is  an  important  device  in  the  general  scheme  of  DCCL  since  a 
logic  one  charge  packet  cannot  be  divided  up  without  destroying  the  logical 
significance  of  the  signal.  In  other  words,  a  logic  one  charge  packet  by 
itself  has  a  fan-out  of  one.  The  CFO  takes  care  of  the  situation  where  a 
fan-out  of  two  is  needed  and  accomplishes  this  task  in  one  clock  period. 

The  CTN  is  a  single  transistor  device  that  was  developed  to  transfer 
signal  charge  efficiently  over  distances  and  geometry  that  prohibit  the 
use  of  CCD  shift  register.  A  CTN  can  be  described  as  a  diffused  region 
terminating  the  typical  CCD  channel  with  a  metal  line  attaching  it  to  another 
diffusion  some  distance  away  (typically  several  mils.).  The  latter  diffusion 
has  an  adjoining  gate  electrode  typically  biased  with  a  D.C.  voltage,  which 
begins  the  next  DCCL  device.  Figure  4-51  shows  a  schematic  timing  diagram 
and  surface  potential  diagram  for  the  CTN  structure.  Signal  charge  is 
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Figure  4-4%.  Slock  diagram  of  the  64  point  fast  Hadamard  Transform 
Chi  p . 
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Figure  4-49.  Block  diagram  of  A1  through  A12  functional  block. 
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acjacent  gates.  This  charge  effectively  transfers  across  the  interconnect. 


out  through  the  latter  diffusion  and  across  the  silicon  surface  under  the 


adjoining  D.C.  gate.  This  transfer  cycle  repeats  each  clock  period  as  the 
CTN  is  driven  by  the  very  gates  that  form  the  signal  paths  it  is  interconnecting. 
Since  the  CTN  consists  primarily  of  metal,  and  in  some  cases  diffused  tunnel 
(for  passing  signal  charge  under  metal  lines),  it  provides  a  serviceable 
and  simple  signal  transmission  system  for  DCCL  devices.  CTN's  are  serviceable 
because  they  permit  a  wide  range  of  chip  layout  configurations  that  would 
otherwise  be  impracticle  (if  not  impossible).  CTN's  are  simple  because  they 
permit  DCCL  devices  to  be  interconnected  virtually  in  the  same  manner  as 
MOS  transistors. 

4.4.1  FHT-0  Test  Chip 

A  test  chip  for  the  FHT-1  has  been  designed  and  is  called  FHT-0.  The 
purpose  of  this  chip  is  to  take  all  the  functional  blocks  that  compose  the 
FHT-1  and  configure  a  test  interface  around  them.  This  way,  the  functional 
blocks  can  be  thoroughly  tested  and  characterized  before  attempting  to  charac¬ 
terize  the  full-up  chip.  The  test  configuration  which  is  of  primary  interest 
is  the  A1  FHT  functional  blocks.  The  test  configuration  for  this  is  called 
the  A-FHT . 

4 . 4 . 2  A-Hadamard  Functional  Block 

The  shortest  functional  block  (A-FHT)  has  been  operated  at  the  lowest 
frame  rate  and  found  to  be  completely  functional.  Figure  4-52  shows  a 
functional  block  diagram  of  the  A-FHT  test  arrangement. 

Figure  4-53  shows  a  state  variable  timing  diagram  used  in  testing  for 
the  A-FHT.  The  INI  pattern  shown  there  is  part  of  one  actually  used  in  testing 
and  was  picked  as  one  that  would  exercise  all  the  signal  paths  within  the  A-FHT 
cell.  Figure  4~54  and  4-55  show  photos  of  the  output  signals  of  the  A-FHT. 

Figure  4-54  has  the  actual  input  data  pattern  on  the  lower  trace  and  was  taken 
with  all  zeros  on  the  MUX. IN.  The  patterns  shown  as  C+,  C-  and  OUT?  are 
correct  except  that  the  signal  amplitude  at  OUT?  is  quite  low.  The  output 
at  OUT?  is  just  what  the  subtractor  F2A  is  developing  at  its  sum  output  node. 
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Figure  4-52.  Functional  block  diagram  of  the  A-FHT  cell  test  arrangement 


Figure  4-53.  A-FHT  functional  block  state  variable  timing  diagram.  The  data  output  (0UT1S2)  can  be 
algebraically  expressed  in  terms  of  the  input  (INI)  in  the  interest  of  compact  notation: 
a+b-tc  +  =  Cl  ,S1 ,  a+b+c-  =  bl  ,dl ,  c+d+c+  =  C3,S3,  c+d+c  =  b3,d3,  where  S  =  sum,  C  *  carry, 
d  =  difference,  b  =  borrow. 


Figure  4-55  shows  a  case  where  the  MUX. IN  is  set  with  an  alternating  one-zero 
pattern  and  in  the  right  half  shows  the  same  pattern  as  those  of  the  state 
variable  diagram  (Figure  4-53).  The  C+,  C-  and  MUX  OUT  signals  in  the  photo 
agree  very  well  with  the  diagram  (in  the  diagram  a  logic  one  is  a  high  level 
and  in  the  photos  a  logic  one  is  a  low  pulse).  As  in  Figure  4-54  the  signal 
amplitude  is  quite  low  but  it  essentially  agrees  with  the  state  variable  diagram. 
The  alternating  one-zero  pattern  at  the  right  side  of  the  photo  should  appear 
as  a  string  of  10  ones  except  that  AND  function  degrades  these  signals  below 
the  maximum  zero  level  of  the  output  CFO  (Ml 0000 ) .  Some  adjustment  of  the  FG 
switching  threshold  is  possible  which  reveals  the  presence  of  the  lesser  5 
of  the  10  ones  but  this  is  nonstandard  and  causes  undesirable  noise  margin 
shifts  in  the  HA. 

Three  problems  have  been  identified  in  the  A-FHT  functional  block.  The 
first  problem  is  that  the  CFO  device  developes  a  considerable  bias  charge 
level  at  its  output.  The  second  problem  is  that  the  internal  transfer 
efficiency  is  marginal.  The  third  problem  is  that  AND  function  needed  a 
logic  one  charge  packet  that  was  60%  larger  than  for  the  half  adder. 

Solutions  to  these  problems  have  been  implemented  on  FHT-OC.  Characterization 
of  FHT-OC  will  precede  fabrication  and  testing  of  FHT-1.  Following  is  more 
detailed  discussion  of  these  problems  and  their  solutions. 

4.4.3  Charge  Fan-Out 

The  signal  Charge  Fan-Out  (FCO)  was  operated  between  0.2  MHz  and  2.0  MHz 
with  the  clocks  and  bias  which  properly  operates  the  half  adder  (HA).  Figure 
4-56  shows  aschematic  of  the  circuit  test  arrangement  consisting  of  the  CFO, 
labeled  M10000,  and  the  HA,  labeled  RLA.  The  mismatch  in  amplitude  of  the  logic 
ones  of  the  CFO  was  found  to  be  typically  6%.  This  is  quite  acceptable  for 
performing  digital  logic.  A  problem  with  the  logic  zeros  occurs  as  operating 
frequency  increases  from  around  400  kHz  where  they  increase  linearly  in 
amplitude  until  at,  around  2.5  MHz,  the  typical  logic  zero  equals  50%  of  a 
logic  one.  This  is  an  unworkable  situation  for  any  logic  elements  which  would 
follow  since  the  CFO  and  HA  both  have  fat  zero  -  minimum  one  input  noise  margins 
of  around  30%-70%.  This  situation  is  due  to  the  impedence  of  the  floating  gate 


4-65 


channel.  Since  a  "two-logic-ones"  charge  packet  is  available  under  the  02A 
electrode  each  clock  period,  this  charge  must  either,  (1)  flow  past  the 
floating  gate  (FG)  or,  (2)  flow  under  the  03  electrodes.  If  in  the  (1)  case, 
the  charge  flow  past  the  FG  is  sufficiently  restricted,  some  of  this  charge 
will  be  left  behind  and  subsequently  be  pulled  under  the  03  electrodes.  This 
charge  will  then  be  transferred  to  the  output  circuits  in  the  test  arrangement 
Figure  4-57  shows  one  of  several  plots  made  of  the  logic  one  and  zero  levels 
vs.  frequency.  It  clearly  shows  that,  although  the  logic  ones  remain  stable 
and  well  matched,  the  logic  zeros  level  increases  linearly  with  frequency. 

There  are  two  possible  solutions  for  this  situation  that  were  being  con¬ 
sidered.  One  was  to  widen  the  floating  gate  channel  just  enough  to  meet  the 
maximum  speed  criterion  but  not  seriously  compromise  the  existing  chip  layout. 
The  other  was  to  go  to  a  CFO  design  using  a  dynamic  N-MOS  inverter  to  switch 
the  formation  of  the  two  output  charge  packets.  While  the  trade-offs  of  both 
approaches  are  being  examined,  widening  the  floating  gate  channel  was  taken 
as  the  best  near  term  solution.  Calculations  indicate  that  this  will  yield 
about  a  7 %  bias  level  for  logic  zeros  at  2.5  MHz.  This  will  be  quite 
adequate  for  the  FHT-1 . 

Figure  4-58  shows  a  plot  of  normalized  fat  zero  level  vs.  clock  frequency 

The  fat  zero  levels  plotted  are  for  FHT-OB  measured,  FHT-OB  calculated  by  the 

(9) 

charge  control  method,  FHT-OC  calculated  by  the  charge  control  method,  and 
FHT-OC  calculated  by  a  linear  region  transfer  (LRT)  model.  Test  results 
taken  on  CFO  test  configurations  of  FHT-0  (and  others)  tends  to  confirm  the 
charge  control  model.  The  LRT  model  assumes  that  charge  leaving  a  potential 
well  of  dimension  W  and  L  over  a  barrier  whose  dimensions  are  Wg  and  Lg  is 
governed,  to  a  first  order,  by  linear  region  current  flow  in  a  channel  of  W/L 
where  Wg/Lg  »  W/L.  There  is  not  yet  any  experimental  basis  that  supports 
the  LRT  model . 
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Figure  4-58.  A  normalized  plot  of  the  fat  zero  level  of  a  FHT-OB  charge  fan-out 
circuit  vs.  clock  frequency.  The  curves  show  both  the  measured 
and  the  calculated  results. 


4.4.4  Transfer  Efflcienc 


Output  signal  patterns  revealed  marginal  efficiency  in  the  transfer  paths 
that  lead  to  the  signal  inputs  of  the  two  AND  logic  gates  contained  in  the 
AFHT  block  shown  in  Figure  4-52.  The  solution  for  this  is  to  minimize  the 
shift  register  lengths  within  the  dual  half  adders  and  change  to  n+  diffused 
tunnels  for  several  of  the  CTN  to  lower  capacitance  and  thereby  increase 
operating  efficiency.  Calculations  show  that  the  n+  tunnel  CTN  will  operate 
with  efficiencies  of  80%  or  better  at  2.5  MHz  which  should  be  more  than  adequate 

4.4.5  AND  Function 

The  third  problem  is  that  the  D-Well  (01-V3  potential  well)  of  the  AND 
function  is  some  60%  larger  in  charge  capacity  than  the  D-Well  of  the  HA.  To 
do  the  AND  operation,  two  logic  charge  packets  are  combined  in  the  D-Well. 

In  the  case  of  two  logic  ones,  one  of  them,  in  effect,  scuppers  across  the 
VB  barrier  potential  into  a  01  electrode  potential.  This  is  then  the 
output  of  the  AND  function.  In  the  case  of  the  AND  function,  output  as  a 
result  of  two  ones  will  be  a  40%  logic  one  (i.e.  a  degraded  logic  zero). 

This,  upon  input  to  a  CFO,  will  give  outputs  that  are  only  about  20%  logic  one. 
This  is  precisely  the  observed  case  in  the  serial  outputs  of  the  A-FHT  test  con¬ 
figuration.  Just  at  the  point  in  time  when  logic  ones  should  be  observed  at 
the  output  of  the  CFO  the  observed  ones  are  20%  or  less  of  their  full  value. 
Further  testing  of  the  distributed  MUX  (C1B000)  test  configuration  confirmed 
this  situation. 

The  tentative  solution  is  to  use  the  HA  for  performing  the  AND  function 

by  using  the  carry  output  and  discarding  the  sum  output.  There  is  the  dis¬ 

advantage  that  the  HA  requires  more  area  than  just  the  present  AND  con¬ 
figuration.  However,  this  is  more  than  offset  by  the  fact  that  the  carry 
output  is  a  refreshed  signal  i.e.  logic  ones  are  full  value  and  logic  zeros 

are  empty  charge  packets.  Figure  4-59  shows  for  comparison  the  logic  level 

truth  tables  of  the  HA  AND  and  the  simple  AND  functions  (assuming  equivalent 
D-Well  capacities  and  an  input  noise  margin  of  30-70). 
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Figure  4-59.  Comparative  logic  level  truth  table  (a)  and  corresponding 

plot  (b)  for  the  half  adder  AND  (output  =  x)  and  the  simple 
AND  (output  =  y)  functions.  Inputs  (i+j)  and  outputs  are 
normal i zed . 


Table  4-5  summarizes  the  progress  in  FHT-OB  test  and  characterization. 
Table  4-6  gives  the  major  design/layout  changes  contemplated  as  of  this 
report  date  for  the  final  test  chip  (FHT-OC). 
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FHT-OB  HIGH  LEVEL  BRIEFING 


4.5  MEMORIES 


The  Advanced  Memory  Study  was  completed  In  July  1976;  this  particular 
program,  including  the  one  immediately  preceding  it,  were  directed  toward 
investigation  of  the  offset  gate  CCD  memory  structure.  A  detailed  description 
of  the  offset  gate  device  that  includes  its  theory  of  operation,  basic  geo¬ 
metry,  topology  and  topography  considerations,  operating  characteristics  and 
mask  set  and  process  evolution  are  provided  in  Section  3.6  of  this  report. 

The  offset  gate  device  study  program  in^uded  fabrication  of  shift 
registers  of  various  sizes  and  organizations  to  provide  an  accurate  and 
direct  comparison  of  the  offset  gate  configuration  with  a  standard  or  conven¬ 
tional  shift  register  CCD  memory.  One  of  the  most  noteworthy  accomplishments 
of  this  program  was  the  design  and  development  of  a  16  kilobit  serial -paral lei - 
serial  (SPS)  memory  building  block  that  was  used  as  the  major  test  element. 

The  LSM-3  mask  set  included  a  7.5  micron  design,  as  well  as  a  5.0  micron  design, 

2 

that  provided  bit  densities  of  225  and  100  microns  /bit,  exclusive  of  the 
area  required  by  the  output  circuits. 

The  7.5  and  5.0  micron  offset  gate  memory  blocks  were  produced  and  tested 
under  a  variety  of  conditions.  Experimental  data  indicated  that  the  LSM-3 
process  sequence  produced  uniform  and  repeatable  SPS  memory  blocks  that  gave 
predictable  results.  Further  evaluation  of  the  offset  gate  configuration 
indicated  that  the  complex  geometry  of  this  design  would  have  a  significant 
impact  on  reducing  memory  cell  yields.  It  was  anticipcated  that  further 
improvements  in  photolithographic  techniques  would  act  as  a  forcing  function 
toward  producing  offset  gate  devices  with  higher  bit  densities.  Development 
of  an  offset  gate  structure  with  bit  densities  greater  than  that  obtained  by 
the  5.0  micron  SPS  memory  block  was  expected  to  require  significant  changes 
to  the  processing  sequence. 

The  offset  gate  memory's  internal  organization  was  also  re-examined  as 
additional  data  was  accumulated  on  device  operating  characteristics.  At  the 
beginning  of  the  advanced  memory  study  program,  simple  serial  shift  registers 
were  considered  as  the  primary  candidates  for  implementation  of  large  bulk 
delay  memories.  It  is  worth  repeating  that  the  ultimate  goal  of  CCD  memories 
was  their  use  as  ultra-high  density  large  bulk  memories.  The  initially  proposed 
serial  shift  register  memory  organization  was  considered  to  be  appropriate  at 
that  point  in  time  for  several  reasons: 
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(1)  System  studies  Indicated  that  the  overall  system  bit  error 
rate  and  system  reliability  could  easily  be  maintained  at 
acceptable  levels. 

(2)  CCD  clocking  schemes  appeared  to  be  quite  simple. 

(3)  Very  low  power,  effective  bit  refreshing  circuits  appeared 
to  be  feasible. 

As  time  elapsed,  more  experimental  data  made  It  clear  that  the  tradeoffs 
shifted  in  favor  of  a  serial -paral lei -serial  (SPS)  organization,  for  the  basic 
memory  unit  or  building  block.  The  SPS  organization  placed  less  stringent 
requirements  on  the  CCD  Itself,  by  lessening  the  need  for  achieving  high 
transfer  efficiency. 

The  next  major  innovation  in  CCD  memories  occurred  during  1979;  this 
involved  the  design  of  a  1.2  kilobit  and  2  kilobit  Interleaved  SPS  memory, 
that  significantly  Increased  bit  density.  In  essence,  data  was  clocked  into 
the  serial  register,  shifted  Into  the  "A"  parallel  stream;  this  was  followed 
by  the  input  of  additional  data  into  the  serial  register  that  was  then  clocked 
into  the  “B"  parallel  stream.  With  both  "A"  and  "B"  data  Inputs  loaded  into 
the  interleaved  parallel  registers,  the  clocking  sequence  permitted  simultaneous 
shifting  of  data  through  both  Interleaved  parallel  registers,  with  sequentially 
clocked  output  of  the  data  through  the  output  serial  register. 

A  number  of  unique  concepts  were  generated  during  the  course  of  this 
memory  study  program;  these  concepts  included  the  Yield  Enhancement  Study  (YES), 
that  discussed  the  possibility  of  deleting  defective  memory  blocks  from  a 
large  scale  array  by  means  of  pro^rammble  fuse  links.  These  fuse  links  can  be 
programmed  and  blown  open  to  disconnect  defective  memory  cells  from  the  large 
scale  memory  array.  The  fuse  link  technique  suggested  is  similar  in  concept 
and  design  Implementation  to  that  employed  by  PROM  suppliers  and  users. 

An  Entire  Slice  Processing  (ESP)  concept  was  also  advanced  as  a  method 
of  producing  billion-bit  memory  arrays.  In  essence,  an  entire  wafer  would  be 
comprised  of  Interconnected  memory  cells  that  could  be  tested  following  fabrica¬ 
tion,  to  determine  the  row  and  column  location  of  defective  memory  cells. 

The  defective  cells  would  then  be  removed  from  the  array  by  blowing  fusible 
links  of  nichrome  or  Titanium-Tungsten,  or  blowing  diodes  Inserted  into  each 
conductor  lead,  to  Insure  the  removal  of  unwanted  memory  cell  outputs. 


5.0  RADIATION  TEST  RESULTS 


5.1  INTRODUCTION 

The  three  commonly  specified  nuclear  radiation  environments  for  military 
and  spacecraft  applications  of  semiconductors  are  (1)  prompt-pulse  ionizing 
radiation  (2)  the  total  dose  of  ionizing  radiation,  and  (3)  high  energy 
neutron  irradiation.  The  TRW  NE1  CCD's  were  evaluated  to  the  effects  of 
the  first  two  of  these  environments.  It  had  been  intended  to  test  in  the 
neutron  environment  but  test  time  did  not  permit. 

The  very  nature  of  signal  flow  within  CCD's,  that  is,  with  packets  of 
charge  rather  than  voltages  and  currents,  makes  them  probably  the  most 
functionally  sensitive  semiconductor  device  to  prompt-ionizing  radiation.  A 
prompt-ionizing  dose  of  as  little  as  1  Rad  (Si)  produces  sufficient  charge  in 
a  CCD  to  swamp  out  any  stored  information.  Thus,  in  the  prompt  pulse  tests, 
the  objective  was  to  determine  the  threshold  of  data  upset  in  the  test  devices. 

Since  CCD's  are  of  metal -oxide-si  1  icon  (MOS)  construction,  they  are  sus¬ 
ceptible  to  the  effects  of  trapped  oxide  charge  produces  by  large  total  doses 
of  ionizing  radiation.  The  predominant  effect  is  the  shift  in  device  thres¬ 
hold  voltage  as  observed  in  other  MOS  devices.  Other  effects  of  concern  on 
the  CCD  due  to  total  ionizing  dose  include  degradation  of  the  intrinsic 
parameters  such  as  charge  transfer  efficiency  and  dark  current. 

The  test  devices  used  in  the  radiation  tests  were  a  16-stage  shift 
register  and  a  half-adder.  One  of  each  of  these  devices  is  contained  on  an 
NE1  die.  The  die  were  mounted  in  40-pin  flat  packages  with  the  leads  bonded 
to  either  a  shift  register  or  half-adder.  Each  die  also  contains  several 
independent  MOSFET's  which  are  used  for  threshold  voltage  measurements. 

r 

Three  of  these  FET's  were  also  bonded  to  package  leads  in  both  the  shift 
register  and  half  adder  packages.  The  bonding  diagrams  for  the  half-adder 
and  the  shift  register  are  shown  in  Figures  5-1  and  5-2,  respectively.  The 
test  FET's  3,  4,  and  10  use  common  gate  and  common  drain  connections  to 
package  pins  with  separate  pins  for  each  source. 

A  summary  of  the  test  results  is  provided  in  Table  5-1. 
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Table  5-1.  NET  Radiation  Test  Result  Summary 


Radiation  Test 

Test  Circuit 

Criteria 

Resul ts 

Prompt  Ionizing 

4 

Range:  4x10  to 

3  x  107  RAD 
(Si  )/Sec 

16  Bit  Shift 

Regi ster 

•  Upset  Threshold: 

Visual  identification 
of  disturbance  in  the 
device  output  wave¬ 
forms  . 

•  Saturation:  Dose 
rate  at  which  test 
device  has  all  bits 
at  logic  "1". 

•  Recovery  Time:  Time 
required  to  restore 
output  waveforms  to 
their  undisturbed 
condition. 

»%8  x  104  RAD  (Si )/Sec 

•%1  x  107  RAD  (Si )/Sec 

•  Pf  =  100  kHz 

Upset:  160-320  ysec 

•  Saturation:  1-4  msec 

Total  Ionizing  Dose 

16  Bit  Shift 

Register 

FET's 

Hal f  Adder 

•  Charge  Transfer 
Inefficiency 

Increase 

•  Threshold  Voltage 

•  Correct  Logic 

Function 

•  0.8  x  10~7  to 

4.2  x  1 p_//RAD(Si )* 
(104-105  RAD (Si)  Range) 

•  -13  >iV/RAD(Si) 

•  104  RAD(Si)* 

♦With  Bias  Voltage  Rea^'iistments  Between  Exposures. 
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Figure  5-1.  Half-adder  bonding  diagram. 


5.2  PROMPT-IONIZING  RADIATION  UPSET  TESTS 
5.2.1  Test  Setup  and  Discussion  of  Results 

Five  CCD  shift  registers  were  tested  in  the  prompt-ionizing  radiation 
environment  for  observation  of  any  upset  to  device  data  inputs.  The  tests 
were  conducted  in  the  TRW  Febetron  flash  x-ray  facility  on  March  28  through 
March  30,  1978. 

The  devices  were  operated  in  two  modes  of  data  input:  (1)  all  zeros 
input,  and  (2)  an  alternate  pattern  of  ones  and  zeros.  Thus,  the  objective 
of  these  tests  was  to  observe  the  perturbation  of  the  logical  one  and  zero 
levels  as  outputted  from  the  device. 

The  test  set-up  used  is  shown  in  Figure  5-3.  The  test  sample  was  inserted 
into  an  electromagnetical ly  shielded  test  fixture  which  was  placed  at  various 
distances  from  the  Febetron  flash  x-ray.  The  clock  generator,  data  pattern 
generator,  and  DC  bias  supplies  were  located  in  the  exposure  room  to  minimize 
cable  length  to  the  test  fixture.  A  line  driver  was  used  to  buffer  the  de¬ 
vice  output  signal  to  the  oscilloscope  in  the  instrumentation  room.  All 
equipment  in  the  exposure  room,  except  the  test  fixture,  was  shielded  with 
lead  bricks  and  placed  out  of  the  direct  field  of  the  Febetron  to  minimize 
radiation-induced  noise  pickup. 

Because  of  the  high  inherent  susceptibility  of  CCDs  to  ionizing  radia¬ 
tion,  it  was  found  necessary  to  place  lead  shielding  around  the  test  fixture 
to  reduce  the  dose  rate  impinging  on  the  device.  With  the  lead  shielding 
and  enough  distance  from  the  Febetron  target,  the  dose  rate  level  at  which 
upset  just  occurs  could  be  determined. 

4  7 

The  devices  were  irradiated  over  a  range  of  4  x  10  to  3  x  10  Rads  (Si)/ 
sec.  Oscilloscope  photographs  of  device  output  response  are  presented  in 
Appendix  A.  The  upper  trace  of  each  photo  shows  the  long-term  response  (50 
to  200  microseconds/division)  and  the  lower  trace  shows  the  short-term 
response  (20  to  50  microseconds/division)  of  the  same  output.  All  the  samples 
were  also  irradiated  with  an  8-ones  and  8-zeros  input  pattern. 
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Figure  5-3.  Shift  Register  Upset  Test  Set-Up 


'.til 

In  general,  for  all  samples  tested,  noticeable  deviation  of  the  device 
output  data  pattern  begins  to  occur  at  a  dose  rate  of  about  8  x  104  Rads  (Si)/ 
sec.  For  most  of  the  sample  photos  taken  at  low  dose  rate,  a  full  one-level 
pulse  is  observable  in  the  15th  and/or  16th  data  bit  position.  These  error 
pulses  cannot  be  explained  completely,  but  are  presumed  to  be  generated  by 
the  flash  x-ray  electromagnetic  noise  which  couples  to  the  data  input  terminal 
of  the  device  or  by  photocurrent  produced  in  the  inject  diode  junction.  The 
true  radiation  response  of  the  devices  is  the  gradual  rise  to  saturation  with 
dose  rate  of  the  logical  zero  bits  in  the  data  pattern.  This  is  the  result 
of  free  charge  generated  in  the  silicon  substrate  which  fills  the  potential 
wells  which  transport  the  data  through  the  device. 

The  most  pronounced  filling  of  the  potential  wells  by  the  radiation- 
induced  charge  appears  to  be  localized  towards  the  center  of  the  device's  length. 
This  is  seen  in  the  low  dose  rate  photographs  where  the  middle  data  bits 
are  raised  toward  the  logical  one  level  more  than  the  beginning  and  ending  bits. 
This  phenomenon  can  be  explained  by  the  higher  concentration  of  excess  charge 
occurring  near  the  center  of  the  silicon  substrate  as  charge  near  the  ends  is 
swept  away  more  rapidly.  As  dose  rate  is  increased,  the  output  deviation 
appears  to  increase  linearly  until  saturation  of  all  the  samples  tested 
occurred  at  approximately  1  x  10^  Rads (Si  )/sec .  Dose  rates  above  this  level 
resulted  in  no  significant  increase  of  device  recovery  time. 

On  sample  number  5,  the  effect  of  clock  frequency  on  device  recovery  time 
was  tested.  By  switching  the  clock  frequency  from  the  normal  100  kHz  to  200 
and  300  kHz,  the  recovery  times  are  reduced  by  the  same  multiples.  This,  of 
course,  is  due  to  the  excess  charge  being  swept  out  faster  from  the  device. 

Three  of  the  samples  (serial  numbers  6,  9,  and  15)  were  also  irradiated 
with  an  input  data  pattern  consisting  of  eight  ones  alternated  with  eight 
zeros.  The  same  general  effects  as  observed  with  an  all -zero  input  pattern 
were  seen  here.  Sample  number  9,  which  was  not  operated  at  full  well 
capacity,  is  seen  to  have  its  logical  one  level  raised  as  well  as  the  zero 
level  by  the  ionizing  radiation. 


No  noticeable  effect  on  device  operation  was  observed  several  seconds 
after  exposure  to  the  Febetron  radiation.  This  is  to  be  expected  since 
the  total  ionizing  dose  on  each  sample  did  not  exceed  1  or  2  Rads(Si). 

5.2.2  Prompt-Ionizing  Radiation  Upset  Test  Data 

Figures  5-4  through  5-12  represent  typical  oscilloscope  photographs  of 
the  CCD  shift  register  response  to  prompt  ionizing  radiation.  The  term 
"response"  should  be  understood  to  mean  the  effect  on  digital  data  which  was 
in  the  shift  register  at  the  time  of  the  radiation  event.  The  clocking 
frequency  (100  kHz)  of  the  devices  is  slow  compared  to  the  duration  of  the 
radiation  pulse  (25  nanoseconds);  thus,  the  photographs  actually  show  the 
perturbed  digital  data  and  the  excess  radiation-induced  charge  which  are 
clocked  out  of  the  shift  register  immediately  after  the  pulse. 

Dual  traces  are  shown  on  each  photograph  of  the  same  output.  The  upper 
trace  shows  long-term  response  and  recovery  of  the  device  while  the  lower, 
faster  trace  shows  more  detail  of  the  digital  bits  just  coming  out  of  the 
device.  A  logic  one  is  represented  by  a  negative  excursion  of  the  waveforms. 

5.3  TOTAL  IONIZING  DOSE  TESTS 

5.3.1  Test  Setup  and  Discussion  of  Results 

Total  ionizing  dose  testing  was  performed  on  the  NE1  CCD's  over  the 
period  of  April  through  June,  1978  in  TRW’s  Building  84  test  facility.  Both 
shift  registers  and  half-adders  were  tested.  The  objective  of  these  tests 
was  to  observe  the  effects  of  total  ionizing  dose  on  intrinsic  CCD  parameters 
such  as  charge  transfer  inefficiency,  threshold  voltage,  and  dark  current 
(these  in  the  shift  register)  and  also  to  observe  the  effects  on  functional 
operation  of  a  CCD  logic  device  (the  half-adder). 

Testing  began  with  four  samples  of  the  shift  register  and  two  half¬ 
adder  samples  from  lot  number  NE1-1.  The  same  test  fixture  and  instrumenta¬ 
tion  as  in  the  upset  tests  were  used  here  with  the  exception  of  the  line 
drivers,  since  the  equipment  was  located  close  to  the  radiation  source.  The 
radiation  source  was  a  Gammacell  Co^  source  capable  of  providing  approximately 
1000  Rads(Si)  per  minute  to  the  test  sample.  The  test  fixture,  a  circuit 
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board  with  a  flat-pack  test  socket,  was  inserted  into  the  experiment  cylinder 
of  the  Gammacell  with  all  cabling  routed  out  through  the  top  orifice.  This 
allowed  for  in-source  biasing  of  the  device  and  immediate  measurements  when 
the  cylinder  was  raised.  External  cable  changes  were  required  when  testing 
the  half-adder. 

A  plot  of  exposure  time  versus  total  absorbed  dose  is  maintained  on  a 
monthly  basis  for  the  Gammacell.  Thus,  the  required  exposure  time  for  given 
dose  on  the  test  sampleswas  easily  obtained.  As  added  verification  of  the 
received  dose,  thermal  luminescent  dosimeters  (TLD's)  were  occasionally 
attached  to  the  test  fixture. 

The  shift  registers  were  tested  first.  Pre  and  Post-irradiation  data 
consisted  of  oscilloscope  photographs  of  the  data  output  from  the  device  and 
X-Y  recorder  plots  of  surface  potential  versus  gate  voltage  for  two  of  the 
on-chip  test  FET's.  These  photographs  (as  well  as  real-time  observation) 
indicated  the  effect  of  radiation  on  the  functional  operation  of  the 
shift  register  and  allowed  the  measurement  of  charge  transfer  inefficiency 
(CTI).  By  measurement  of  the  gate  voltage  shift  on  the  X-Y  plots,  the 
threshold  voltage  change  was  read  directly.  Sample  total  dose  test  data  is 
contained  in  Section  5.3.2. 

Data  input  to  the  shift  registers  consisted  of  an  alternate  pattern  of 
eight  zeros  and  eight  ones.  No  "fat-zero"  was  used;  therefore  the  initial 
CTI  values  are  not  optimum.  The  initial  bias  voltages  and  clocking  ranges 
for  the  shift  registers  are  given  as  follows: 


Terminal 


Voltage 


0  to  +10 
0  to  +10 
0  to  +10 
8 . 5  to  18 
0.8  to  8.2 
+  8.0 
+  3.0 
+  4.0 
+15.0 
+2  to  7.6 
-  8.0 


-i 


With  these  values,  the  following  initial  CTI's  were  measured  for  the  four 
shift  register  samples: 


Sample  No.  CTI 

1  .002 

3  .002 

4  .004 

18  .0009 


After  irradiation  to  approximately  500  Rads(si)  total  dose,  each  shift 
register  began  to  experience  severe  degradation  of  functional  operation.  Th 
was  manifested  as  a  rapid  rise  of  the  zero  bits  in  the  data  pattern  up  to 
the  logic  one  level.  This  pattern  persisted  even  after  the  devices  were 
removed  from  the  Co*^  source.  By  the  time  each  device  had  received  1000 
Rads(Si),  the  device  data  output  was  no  longer  usable  with  respect  to  logic 
level s . 

At  this  point,  a  readjustment  of  bias  voltages  was  suggested.  It  was 
found  by  shifting  the  inject  diode  (0^)  voltage  swing  negatively  by  from 
1  to  3  volts  for  all  samples,  that  functional  operation  could  be  restored 
with  a  distinct  pattern  of  ones  and  zeros.  For  sample  numbers  1  and  3, 
operation  could  be  maintained  in  this  manner  (by  continual  adjustment  of  0^) 
up  to  levels  of  approximately  20,000  Rads(Si).  Sample  numbers  4  and  18 
could  not  be  oeprated  past  10,000  Rads(Si). 

Irradiation  was  continued  up  to  58,000  Rads(Si)  for  sample  number  1  and 
to  96,000  Rads(Si)  for  sample  number  3.  Measurements  of  post-radiation  CTI 
are  as  follows: 


Sample  No. 

Dose 

CTI 

1 

12,000 

.003 

1 

58,000 

.01 04 

3 

25,000 

.0125 

3 

96,000 

.0140 

Threshold  voltage  shifts  were  measured  on  sample  numbers  1  and  4.  The 
FET's  on  number  1  showed  a  maximum  shift  of  -0.75  volts  at  58,000  Rads(Si). 


Next,  the  two  half-adder  samples  were  tested.  A  four-trace  oscilloscope 
was  used  to  monitor  the  two  inputs  and  the  sum  and  carry  outputs  of  the  device. 
Thus,  the  ability  of  device  to  perform  its  arithmetic  function  was  under 
observation. 

The  functioning  of  both  half-adder  samples  began  to  fail  at  approximately 
1600  Rads(Si)  total  dose.  The  effect  was  that  of  the  logic  one  sum  bits 
going  to  zero  and  the  logic  zero  carry  bits  going  to  one.  As  with  the 
shift  registers,  an  adjustment  of  bias  was  tried.  This  time  clock  phase  four 
(0^)  was  shifted  1  to  2  volts  positively  to  restore  operation.  With  this 
adjustment,  the  half-adders  gave  good  functioning  up  to  9600  Rads(Si).  Past 
this  level,  both  samples  would  only  operate  up  to  approximately  12,000  Rads(Si) 
with  another  adjustment  of  0^. 

The  ability  to  restore  the  functional  operation  of  both  the  shift  register 
and  the  half-adder  after  total  dose  irradiation  by  adjustment  of  biases  indicates 
that  the  effects  observed  may  be  related  to  threshold  voltage  shift.  However, 
functional  operation  of  the  devices  began  to  degrade  at  only  about  1000  Rads(Si) 
while  the  threshold  shifts  measured  on  the  test  FET’s  did  not  exceed  1  volt  at 
levels  as  high  as  58,000  Rad$(S8).  It  is  not  likely  that  the  devices'  functional 
operation  would  be  sensitive  to  the  even  smaller  (not  measured)  shifts  which 
would  occur  at  low  total  doses.  It  is  suspected  that  the  functional  aberrations 
observed  at  low  total  doses  are  related  to  charging  effects  on  the  several 
coaxial  cables  which  lead  to  the  test  fixture  and  which  were  immersed  in  the 
Co^  source  with  the  test  sample. 

After  the  final  dose,  two  shift  register  samples  were  "baked"  in  a 
laboratory  oven  at  200°C  for  approximately  18  hours  in  an  attempt  to  restore 
functional  operation.  Both  samples  recovered  to  their  pre-rad  operational 
states  using  the  original  values  of  bias.  Also,  the  threshold  shifts  of  the 
FET's  were  reversed  with  the  post-bake  plots  retracing  the  pre-rad  plots. 

Towards  the  end  of  the  NE1-1  total  dose  tests,  it  was  discovered  that 
the  oxide  layer  of  the  -1  lot  had  been  improperly  processed.  Since  this 
would  be  a  limiting  factor  in  the  total  dose  hardness  of  the  CCD,  it  was 
agreed  to  package  a  new  set  of  test  devices  from  a  new  lot  (NE1-3)  and  re¬ 
test  to,  hopefully,  higher  total  dose  levels.  The  new  samples  would  also  be 
tested  at  revised  biases  as  determined  from  the  -1  tests. 
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Total  dose  tests  were  performed  on  new  NE1-3  shift  registers  and  half¬ 
adders  in  June  of  1978  using  the  same  methods  as  in  the  -1  tests.  Surprisingly, 
the  -3  samples  proved  to  be  even  more  susceptible  to  total  dose  with  functional 
operation  failing  at  as  low  as  500  Rads(Si)  in  spite  of  the  revised  biases. 

In  addition,  a  radiation-induced  interaction  of  the  test  FET's  with  device 
operation  was  observed.  It  was  found  that  by  biasing  the  gates  of  the  test 
FET's  with  a  large  voltage,  the  device  could  be  restored  to  a  semi -operational 
state. 

At  present,  the  cause  of  the  FET  interaction  at  very  low  total  dose  has 
not  been  determined,  but  deterioration  of  the  Teflon  insulation  in  the  text 
fixture's  coaxial  cables  due  to  high  total  doses  is  suspected.  Construction 
of  a  new  test  fixture  to  verify  this  suspicion  was  not  attempted  due  to  the 
limited  test  time  available.  Instead,  it  was  elected  to  passively  (no  bias 
applied)  irradiate  a  -3  wafer  exclusive  of  any  test  fixture,  and  post-test 
on  a  wafer  probe  station.  This  would  allow  several  shift-registers  (for 
which  the  probes  and  driving  electronics  were  already  set  up)  to  be  tested 
functionally  while  the  test  FET's  on  each  die  remained  unbonded. 

A  wafer  was  irradiated  to  a  total  dose  of  30,000  Rads(Si)  and  then  returned 
to  the  probe  station.  Of  the  several  shift  registers  on  the  wafer  that  were 
probed,  each  was  functional  and  had  good  output  data  waveforms;  the  only  notice¬ 
able  change  was  an  increase  in  CTI  (similar  to  -1  measurements).  The  same 
bias  voltages  were  used  for  pre  and  post  irradiation  measurements. 

Although  the  wafer  tests  show  promising  results  for  the  NE1  CCD's  (total 
dose  hardness  comparable  to  other  tests  in  the  literature),  additional  testing 
should  be  performed  with  bias  voltages  applied  to  the  devices  while  under 
irradiation.  This  is  a  worst-case  situation  for  N-channel  devices  since  the 
positive  gate  voltages  tend  to  drive  trapped  ionization-produced  charge 
toward  the  oxide-silicon  interface.  Preferably,  separate  test  fixtures  should 
be  used  for  irradiation  and  post-irradiation  measurements.  In  addition,  dark 
current  measurements  in  the  shift  register  which  were  neglected  due  to  time 
constraints  should  be  performed.  Once  adequate  total  dose  characterization 
is  obtained,  neutron  tests  can  be  performed. 
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5.3.2  Total  Ionizing  Dose  Test  Data 


Figures  5-13  through  5-16  represent  typical  oscilloscope  photographs 
of  the  CCD  shift  register  and  half-adder  operation  over  the  range  of  total 
ionizing  doses  received.  During  the  irradiation,  as  waveform  degradation 
became  excessive,  device  biases  (0^,  0^)  were  adjusted  to  achieve  the  most 
acceptable  output  waveforms  possible.  For  each  sample  the  final  photo  gives 
the  maximum  total  dose  for  which  a  bias  adjustment  could  produce  reasonable 
operation . 


2000  Rads (Si) 


22,000  Rads (Si) 
(after  inject  diode 
adjustment) 


Top  trace:  Input  (5v/div.) 

Bottom  trace:  Output  (2v/div.) 
Time  scale:  50  microseconds/di v. 


Figure  5-13.  Shift  Register  Output  vs.  Total  Dose  (Sample  #1 ) 
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Pre-rad 


4100  Rads (Si) 


11  ,000  Rads(Si) 
(after  inject  diode 
adjustment) 


25,000  Rads (Si) 
(after  inject  diode 
adjustment) 


Top  trace:  Input  (5v/div.) 
Bottom  trace:  Output  (2v/div.) 
Time  scale:  50  microseconds/div. 


5-14.  Shift  Register  Output  vs.  Total  Dose  (Sample  *3) 


Sum  Out 


Carry  Out 

In  1 

In  2 


Pre-rad 

V:  2v/div.  H:  TO  microseconds/div. 


Sum  Out 

Carry  Out 

In  1 

In  2 


1600  Rads (Si ) 


Figure  5-15.  Half-Adder  Output  vs.  Total  Dose  (Sample  #58) 


Sum  Out 


Carry  Out 

In  1 

In  2 


1600  Rads  (Si)  (with  (>4  adjust) 

V:  2v/div.  H:  10  microseconds/div. 

Sum  Out 

Carry  Out 

In  1 

In  2 

9600  Rads(Si)  (with  (>4  adjust) 


Figure  5-16.  Half-Adder  Output  vs.  Total  Dose  (Sample  *58) 
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6.  SIGNAL  AND  DATA  PROCESSING  APPLICATIONS 


6.1  INTRODUCTION 

The  arithmetic  capabilities  of  DCCL  chips  can  be  utilized  in  filtering 
applications:  correlation,  convolution,  and  fast  transforms  such  as  Fourier, 
Hadamard,  and  Hilbert.  The  inherent  structure  of  these  functions  allows 
them  to  be  cast  into  flow-form  which  is  ideal  for  DCCL  pipeline  computations. 

The  flow  form  of  the  structure  allows  timing  and  data  routing  to  replace  much 
of  the  program  memory  and  control  logic  found  in  general  purpose  processors. 

A  large  number  of  different  flowform  signal  processing  functions  can  be 
formed  using  only  two  types  of  DCCL  Large  Scale  Integrated  (LSI)  chips.  One 
chip  provides  the  arithmetic  function  (AU)  while  the  other  provides  memory 
and  ontrol  (MC). 

6.2  APPLYING  THE  DCCL  CONCEPTS 

The  basic  arithmetic  functions  to  be  realized  are  addition/subtraction, 
multiplication  and  scaling  (multiplication  by  a  power  of  two).  The  arithmetic 
accuracy  required  for  the  different  applications,  as  we  have  seen,  can  vary 
widely;  however,  the  more  stringent  applications  such  as  Itakura  voice  processing, 
can  be  satisfied  with  16-bit  multiplication  accuracy  and  addition/subtraction 
accuracy  required  for  the  different  applications  such  as  Itakura  voice  pro¬ 
cessing,  can  generally  be  satisfied  with  16-bit  computational  accuracy.  In 
some  cases  a  double  precision  add/subtract  capability  of  32-bits  is  required. 

A  DCCL  chip  configuration  with  this  capability  is  shown  in  Figure  6-1.  To 
allow  the  sequence  of  arithmetic  operations  to  be  performed  in  different  orders, 
corresponding  to  different  applications,  multiplexers  are  placed  at  the  input 
to  the  adder  and  multiplier.  A  multiplexer  is  also  provided  at  the  output  so 
that  results  of  the  different  operations  can  be  selected. 

Control  inputs  are  accepted  by  the  arithmetic  chip  to  route  the  data 
through  the  desired  elements  so  that  a  prescribed  sequence  of  operation  is 
performed . 

For  example,  the  radix  two  form  of  the  FFT  kernel  or  "butterfly"  can 
be  accomplished  with  six  add/subtract  operations  and  four  multiplications. 
Initially,  the  sum  of  the  real  parts  of  the  complex  inputs  is  computed  and 
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Figure  6-1.  DCCL  arithmetic  unit  chip. 


the  result  applied  to  the  output  for  storage.  Next,  the  sum  of  the  imaginary 
parts  is  computed  and  applied  to  the  output.  The  third  and  fourth  steps 
consist  of  computing  the  difference  between  the  real  parts  and  the  difference 
between  the  imaginary  parts,  respectively.  These  differences  are  then  routed 
through  the  multiplier  where  they  are  multiplied  by  the  sine  and  cosine 
"twiddle  factors".  Four  passes  through  the  multiplier  are  required  to 
compute  the  four  products.  The  final  steps  in  the  "butterfly"  computation 
which  complete  the  complex  multiply* are  to  sum  one  pair  of  products  and  to 
compute  the  difference  of  the  other  pair.  These  two  results  are  then  returned 
to  storage. 
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To  perform  a  correlation  computation,  the  control  inputs  to  the  arithmetic 
chip  cause  the  output  of  the  adder  to  be  fed  back  to  the  input  thereby  operating 
as  an  accumulator.  The  sequences  of  the  two  input  variables  to  be  correlated 
are  then  applied  to  the  multiplier  and  the  product  accumulated.  Registers 
for  latching  the  data  words  are  not  shown  in  Figure  6-1  because  storage  is 
implicit  in  the  operation  of  DCCL. 

The  fundamental  consideration  in  applying  DCCL  is  the  throughput  delay. 
Best  computational  efficiency  is  obtained  using  pipeline  techniques.  Although 
in  many  of  the  important  applications,  the  computations  can  be  cast  into  a 
flow-form  suitable  for  pipelining,  there  is  also  a  need  for  general  purpose 
computer  operations  such  as  executing  branching  and  jump  instructions.  Condi¬ 
tional  instructions  of  this  type  require  a  comparison  to  be  made  before  the 
next  step  in  the  program  is  determined.  Due  to  the  delay  through  the  DCCL 
arithmetic  logic,  it  is  difficult  to  perform  general  purpose  computing 
efficiently.  With  these  characteristics  of  DCCL  arithmetic  in  mind,  it  is 
important  to  tailor  the  control  chip  to  match  the  characteristics  of  the  AU. 

The  basic  timing  can  be  broken  up  into  blocks  of  N  clock  intervals  where 
N  exactly  matches  the  delay  through  the  AU:  typically  this  may  be  between 
16  to  32  clock  intervals.  Control  of  the  DCCL  arithmetic  unit  can  be 
divided  into  block  and  intra-block  instructions.  The  intra-block  instruction 
controls  allow  pipelined  operations  to  be  performed,  like  the  FFT,  where  data 
words  are  added  and  then  subtracted  on  successive  clock  pulses.  By  changing 
the  block  instructions,  the  intra-block  instructions  can  be  changed  after 
each  block  of  N  clock  pulses.  For  example,  the  AU  can  perform  successive  sums 
and  differences  on  one  block  of  N  samples  followed  by  a  multiply  and  accumulate 
on  the  next  block  of  N  samples.  Since  the  results  of  an  arithmetic  operation 
becomes  available  at  the  arithmetic  unit  output  after  N  samples,  branching, 
skip,  and  jump  instructions  can  be  performed  at  the  block  rate.  Thus,  the 
AU  can  either  perform  flow-form  types  of  computations  at  a  high  rate  or 
general  purpose  computations  at  a  lower  (by  a  factor  of  16  to  32)  block 
rate. 

The  separation  of  the  control  of  the  AU  into  block  and  intra-block 
functions  is  inherent  in  the  shift  register  type  of  architecture  for  the 
DCCL  control  chip  as  shown  in  Figure  6-2.  The  blocks  of  shift  register  memory 


may  be  either  conventional  data  memory  or  shift  register  read  only  memory 
(ROM)  where  a  metallization  mask  determines  how  many  of  the  blocks  are  to  be 
ROM  as  well  as  the  contents  of  the  ROM.  Multiplexers  at  the  input  and  output 
control  the  flow  of  data  and  program  instructions  to  and  from  the  AUs. 

Each  memory  block  in  Figure  6-2  consists  of  shift  registers  and  recircu¬ 
lation  logic.  Nominally,  the  block  would  be  16  or  32  samples  long  with  16 
shift  registers  in  parallel  corresponding  to  a  16-bit/word  parallel  data 
format.  The  shift  register  block  has  a  tap  after  N-l  samples  so  that  data 

can  be  recirculated  back  to  the  input  after  either  N  or  N-l  sample  delays. 

This  allows  the  memory  blocks  to  be  operated  either  as  recirculating  memory 
or  as  a  delay  line  time  compressor  (DELTIC)  so  that  as  the  data  processes 
through  the  register,  the  oldest  sample  is  replaced  by  the  new  input  sample. 

When  the  DCCL  control  chip  is  operated  with  the  DCCL  arithmetic  unit,  the 
two  data  outputs  are  selected  by  the  multiplex  gates  and  applied  to  the  All 
inputs,  while  the  arithmetic  unit  output  is  fed  back  to  the  input  MUX.  The 

lower  multiplex  gate  on  the  right  in  Figure  6-2  selects  the  control  values  from 

either  the  ROM  shift  registers  or  a  data  shift  register;  thus,  the  controller 
operation  can  proceed  either  according  to  a  program  stored  in  the  ROM  or  be 
changed  by  input  values.  The  input  values  can  be  from  either  an  external 
source  (e.g.,  an  interrupt)  or  from  values  computed  by  the  DCCL  arithmetic  unit. 

When  operated  at  the  slower  block  rate,  both  the  data  and  the  next  program 
control  value  can  be  time-interleaved  so  that  both  are  computed  in  a  single  N 
clock  interval  block  time.  With  a  4  MHz  clock  rate  and  N  =  32,  both  operations 
can  be  accomplished  in  8  ysec. 

6.3  SPECIFIC  NRL/DCCL  APPLICATIONS 

TWO  APPLICATIONS  OF  DCCL  to  specific  systems  were  considered  (at  different 
points  in  time)  in  response  to  mutual  TRW/NRL  desires.  The  first  system 
application,  which  comprised  the  great  majority  of  the  project  duration,  was 
for  a  DCCL  Itakura  transform  chip(s).  As  the  program  progressed,  two  condi¬ 
tions  emerged  which  changed  its  direction  and  goals.  First,  the  original  need 
for  the  Itakura  transform  function  dissipated  as  a  result  of  changes  in  a 
parallel  voice  processing  program  (which  was  intended  to  use  this  transform 
chip).  Second,  as  the  chip  development  progressed,  estimates  of  the  design 
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and  layout  effort  required  to  realize  the  Itakura  function  began  to  indicate 
that  these  efforts  were  beyond  the  scope  of  the  original  contract.  As  a  j 

result  of  these  two  conditions,  a  sorting  application  emerged  as  a  more 
useful  project  goal.  By  mutual  agreement,  a  new  goal  of  designing  a  number 
sorting  chip  was  under  taken. 

The  following  paragraphs  describe  these  two  applications  from  the  stand¬ 
point  of  DCCD  implementation. 


6.3.1  Itakura  Transform  Application 


Secure  voice  processing  is  a  potential  application  for  the  pair  of 
DCCL  chips.  The  problem,  stated  simply,  is  to  digitize  the  speech  signal 
so  that  the  security  feature  can  be  implemented  by  performing  numerical 
operations  on  the  data.  Unfortunately,  digitizing  the  speech  directly  (e.g., 
using  PCM)  results  in  a  greatly  increased  bit  rate.  In  order  to  reduce  the 
bit  rate  so  that  it  can  be  transmitted  over  existing  audio  channels,  it  is 
necessary  to  perform  the  secure  voice  processing  which,  in  essence,  compresses 
the  bandwidth  by  source  coding. 

One  of  several  voice  processing  techniques  is  to  employ  a  linear  predic¬ 
tive  coding  (LPC)  algorithm  (ITAKURA,  F.  et  al ,  1972).  The  procedure  for 
the  analyzer  part  of  the  voice  processor  consists  of  passing  a  signal 
through  ten  identical  stages  (as  shown  in  Figure  6-3).  In  each  stage  a 
Parcor  (ITAKURA,  F.  et  al.,  1972)  coefficient,  K.  i  =  1,  2,  ...  10,  is  determined 
and  these  ten  coefficients  in  addition  to  pitch  and  voiceing  information  are 
sent  to  the  receiver  where  the  speech  is  reconstructed. 

As  indicated  in  Figure  6-3,  the  Parcor  coefficients  can  be  written 


where  the  p..  are  correlation  values  and  the  P^  are  the  mean  square  forward 
prediction  errors  (or  backward  prediction  errors).  The  oi  and  Pi  can  be 
written 


N 

pi  =  2  Ai-1  Bi-1 
i  =  l 

and 


Pi-1 


where  E  denotes  expection.  The  ith  stage  forward  (and  backward)  prediction 
errors.  A.,  and  B.  are  related  to  the  previous  stage  values,  and  the  Parcor 
coefficient,  Ki  by 
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Figure  6-3.  Itakura  analyzer. 


In  practice,  the  summation  over  N  samples  which  forms  the  correlation  is 
replaced  by  lowpass  filtering.  The  expected  value  of  the  prediction  error  is 
approximated  by  subtracting  the  product  of  the  Parcor  coefficient  times  the 
correlation  from  the  value  in  the  previous  state.  A  block  diagram  indicating 
the  operations  necessary  to  realize  an  Itakura  analyzer  stage  is  shown  in 
Figure  6-3. 
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Due  to  the  relatively  long  propagation  delay  through  the  DCCL  arith¬ 
metic  unit,  and  the  desire  to  employ  pipeline  computation  for  best  speed 
and  efficiency,  it  is  necessary  to  separate  the  computation  into  a  number 
of  steps.  Each  Itakura  analyzer  section  can  be  computed  with  a  sequence  of 
seven  passes  through  the  arithmetic  chip  to  perform  the  following  operations: 

(1)  Multiply  Ai_1  '  B._1 

(2)  Perform  lowpass  filter  computation 

'  P1-l  ’  2_k  <Ai-l  B1-l>  ♦’  (l-Z_k)P,_2 

(3)  Compute  , 


(4) 

Compute  Kk 

pi-l 

Pi-1 

=  pi-l 

(5) 

Update  Pi 

=  Pi-1 

-  KiP 

(6) 

Update  Ai 

=  Ai-1 

-  K.B,. 

(7) 

Update  B. 

=  Bi-1 

-  KiAi 

In  the  above  list  of  operations,  each  can  be  accomplished  in  a  straight¬ 
forward  manner  by  applying  the  appropriate  controls  to  the  arithmetic  chip. 

The  divide  operation  can  be  implemented  by  a  recursive  algorithm,  which 
requires  a  ROM  table  lookup  to  estimate  the  inverse  value  followed  by  two 
multiplications  and  an  addition  (HUTCHINS,  S.  E.  et  al.,  1975).  Typically, 
a  conventional  ROM  would  be  used;  however,  a  DCCL  ROM  could  be  used  if  the 
longer  access  time  could  be  tolerated. 

A  primary  difficulty  in  implementing  the  Itakura  analyzer  in  an  efficient 
pipeline  fashion  with  DCCL  is  that  the  result  of  one  pass  through  the  AU  must 
be  available  before  the  next  can  commence.  This,  of  course,  holds  true  for 
each  of  the  ten  stages  in  the  Itakura  analyzer.  The  solution  to  this 


problem  is  to  employ  interleaving.  By  accepting,  for  example,  a  10  sample 
delay;  the  first  operation  on  the  list  for  each  stage  can  be  computed  during 
the  N  clock  pulse  Interval  corresponding  to  the  propagation  time  through  the 
AD.  On  the  next  pass  through  the  AU  the  second  operation  on  the  list  is 
performed  for  each  of  the  ten  interleaved  stages.  After  the  last  operation 
of  the  first  analyzer  stage  has  been  completed,  the  time  position  of  the 
result  is  shifted  one  clock  interval  utilizing  the  variable  length  shift 
register  (or  DELTIC)  in  the  control  chip  so  that  the  time  position  corresponds 
to  the  interleaving  for  the  second  analyzer  stage.  This  procedure  is  repeated 
to  complete  the  ten-stage  analyzer  computation.  Although  this  distribution  in 
time  of  the  computation  of  different  analyzer  sections  and  the  interleaving  of 
the  operations  would  require  a  complicated  indexing  scheme  with  a  general  purpose 
computer;  with  the  shift  register  structure  of  the  DCCL  control  chip  the 
required  control  becomes  simple  and  straight-forward. 

6.3.2  Sort  and  Merge  Applications 

The  majority  of  most  general  computer  time  is  spent  sorting  and  reassembling 
numbers,  yet  there  is  no  LSI  device  available  that  has  been  designed  to  do  this 
function  efficiently.  In  August  1978,  work  on  the  voice  processor  application 
was  stopped  to  permit  a  study  of  the  feasibility  of  designing  a  digital  CCD 
Sort  and  Merge  chip. 

This  section  discusses  the  different  Sort  and  Merge  designs  that  resulted 
from  this  design/study  effort. 

6. 3.2.1  DCCL  Sort  and  Merge  Technique  No.  1 

In  every  system  application  considered,  the  unsorted  numbers  are  received 
in  a  continuous  stream  of  2 ' s  Complement  Numbers  of  16  or  32  parallel  bits. 

The  words  may  be  converted  from  parallel  to  serial  format  before  entering  the 
CCD  chip  in  order  to  reduce  the  number  of  input  pads.  On  the  chip  the  input 
is  converted  back  from  a  serial  to  parallel  format  with  a  CCD  shift  register. 

The  largest  of  two  2's  Complement  Numbers  is  determined  by  subtracting 
one  of  the  numbers  from  the  other  and  examining  the  binary  value  of  the  most 
significant  or  sign  bit.  Subtraction  of  2's  Complement  Numbers  is  performed 
by  complementing  one  of  the  numbers  and  then  adding  together  the  complemented 
number,  the  other  number,  and  a  binary  one. 
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EXAMPLE  1.  To  subtract  B  =  4  from  A  *  15 

Complement  B=00100;  B  =  1  1  0  1  1 

A  =  0  1  1  1  1 
ADD  00001 

01012  =  +  10 
Sign  Bit  - ^ 

The  binary  "0"  value  of  the  Sign-Bit  indicates 
that  A  >  B  { i . e .  +  11) 

EXAMPLE  2.  Compare  A  =  4  and  B  =  15 

Complement  B=01  1  1  1;B=1  0000 

A  =  0  0  1  0  0 
ADD  00001 

10  102  =  -  10 
Sign  Bit  — ^ 

The  binary  "1"  value  of  the  Sign-Bit  indicates 
that  A  --  B  (i  .e.  -  10 ) 

The  same  method  of  comparing  2's  Complement  Numbers  by  subtraction  holds  true 
for  negative  2's  Complement  Numbers. 

It  can  be  seen  from  the  examples  given  above  that  only  the  sign-bit  is 
required  to  determine  which  of  the  two  numbers  is  the  larger;  the  actual 
magnitude  of  the  difference  is  not  required. 

In  order  to  generate  the  sign-bit,  we  must  add  together  the  most  signifi¬ 
cant  bit  of  the  complemented  number,  B^,  the  most  significant  bit  of  the 
other  number,  and  a  carry  bit.  The  carry-bit,  Cffl,  is  generated  by  adding 
the  rest  of  the  complemented  number,  Bm  ]  -  B  ,  the  rest  of  the  other  number, 

Am  ,  -  A  and  the  binary  "1"  bit,  C  . 
m- 1  o  o 

The  truth  table  for  the  carry-bit,  C^  ,  generated  by  adding  together  the 
least  significant  bits  of  the  complemented  word,  Bq,  the  other  word  AQ  and 
the  binary  "1"  input,  CQ  is  shown  in  Table  6-1. 
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Table  6-1.  Truth  Table  for  the  Generation 
of  a  Carry  from  a  Full -Adder 


Inputs 

Output 

A 

B 

c 

C1 

0 

0 

0 

0 

0 

0 

0 

0 

1 

0 

0 

1 

0 

0 

1 

1 

1 

1 

0 

0 

1 

0 

1 

1 

1 

1 

1 

1 

1 

1 

The  Boolean  expression  for  the  truth  table  is: 


C1  =  Vo  + 


(A, 


V  Co 


(1) 


The  Boolean  expression  for  the  second  least  significant  carry  can  be  written 
in  a  similar  manner: 


C2  =  Vi  +  (At  +  B,)  Cr  (2) 

If  the  carry  input  factor,  C-j  ,  in  Equation  (2)  is  replaced  with  its  Equation 
(1)  value,  the  expression  for  becomes: 

C2  .  A,B,  *  (A,  *  B,)  (A0B0)  ♦  (A,  *  B, )  (A„  *  B0)  C<> . 

The  carry  outputs  for  the  remainder  of  the  32-bits  can  be  expressed  by  an 
extension  of  Equation  (3).  The  carry-bit  charge  packets  can  be  generaged 
in  a  simple  manner  in  DCCL  by  cascaded  AND  gates  as  shown  in  Figure  6-4. 

It  was  shown  in  the  previous  examples  that  only  the  most  significant 
sum  bit  is  required  to  determine  which  of  two  numbers  is  the  larger.  The 
expression  for  the  most  significant  sum  bit  is: 

S  =  ABC  +  ABC  +  ABC  +  ABC. 


(4) 


Figure  6-4.  A  Sort  and  Merge  Chip  Comparator  Schematic  Diagram. 


Since  Equation  (4)  involves  complemented  terms  it  is  necessary  to  use 
a  floating-gate  cell;  and  with  three  inputs  we  must  use  a  full-adder. 

Each  AND  function  is  performed  during  one  phase  of  a  three  phase  clock 
period  and  therefore  with  the  exception  of  the  most  significant  bit  will 
require  (N-l )/3  clock  periods  to  perform  the  comparison  and  generate  the  next 

to  most  significant  carry  bit  (where  N  =  number  of  bits  per  word). 

If  cascaded  dual  half-adders  are  used  to  generate  the  most  significant 
sum  bit,  two  clock  periods  are  required.  The  comparator  will  then  require 
a  minimum  clock  frequency  of 

fc  =  [(N-l)/3  +  2]  f0  (5) 

where  f  is  the  clock  frequency  of  the  numbers  entering  the  comparator. 

The  unsorted  2's  complement  numbers  enter  the  Sort  and  Merge  chip 

serially  so  that  the  first  numbers  have  to  be  retained  in  a  memory  for 
comparison  with  later  numbers.  The  method  that  we  have  devised  for  the 
temporary  store  to  perform  a  serial  to  parallel  conversion  and  transfer  the 
parallel  numbers,  through  a  1  to  4  multiplexer,  into  four  parallel  registers 
which  are  filled  sequentially.  The  first  number  enters  Register  ,  shown 
in  Figure  6-5.  The  clocks  to  Register  A-j  are  then  inhibited  holding  the  first 
number  in  Register  A^ .  The  second  number  then  enters  Register  B1 .  The 
clocks  to  Register  B1  are  then  inhibited  thereby  holding  the  second  number 
in  Register  B1 .  Similarly,  the  third  number  enters  Register  Cl  and  is  held. 

Simultaneously  with  the  third  number  entering  Register  Cl,  the  Register 
Select  Gates  AND  1  and  AND  2  are  enabled  so  that  the  first  and  second  numbers 
enter  their  respective  Restore  and  Convert  circuits  (RC1  and  RC2).  The 
Restore  and  Convert  circuits  sense  the  binary  value  of  the  input  charge,  by 
comparing  it  with  another  charge  of  fixed  size.  Based  on  the  result  of 
this  comparison,  a  MOS  flip-flop  is  toggled  to  the  appropriate  binary  state, 
flip-flop  provides  a  digital  voltage  output  that  corresponds  to  the  binary 
value  of  the  input  charge.  The  Restore  and  Convert  circuit  also  provides  an 
output  voltage  that  is  the  complement  of  the  binary  value  of  the  input  charge. 
The  true  digital  voltage  output  from  RC1  and  the  complemented  digital  voltage 
output  from  RC2  are  converted  to  charge  packets  that  are  then  applied  to  the 
AND  gates  and  full -adder  of  the  comparator. 


The  comparison  of  the  first  and  second  numbers  takes  place  during  one 
cycle  of  the  input  clock  rate  for  the  unsorted  numbers. 

Depending  upon  the  binary  value  of  the  most  significant  sum  bit  from 
the  number  comparison,  either  the  A1  or  B1  registers  and  the  output  from  its 
corresponding  RC  circuit  are  enabled.  The  clock  signal  to  the  register  that 
is  holding  the  larger  (or  smaller,  if  that  is  the  order  of  the  sort)  of  the 
first  and  second  numbers  is  enabled  and  that  number  is  transferred  into  register 
A2.  Register  A2  can  store  two  numbers. 

The  fourth  number  then  enters  Register  D1  and  the  clocks  to  register  D1 
are  inhibited  thereby  retaining  this  value  in  the  D1  register.  Simultaneously 
with  the  fourth  word  entering  register  D1  ,  the  clocks  to  the  register  containing 
the  number  remaining  in  either  registers  A1  or  B1  are  enabled  and  that  number  is 
transferred  into  register  A2. 

The  fifth  number  enters  and  is  held  in  Register  A1 ;  this  starts  a  new 
cycle  of  the  first  comparison  level.  Simultaneously  with  the  fourth  word 
entering  register  A1 ,  the  Register  Select  Gates  AND  3  and  AND  4  are  enabled  so 
that  the  first  and  second  numbers  enter  their  respective  Restore  and  Convert 
circuits  (RC1  and  RC2)  and  the  comparator.  Numbers  3  and  4  are  then  sorted 
in  a  similar  manner  to  1  and  2  and  placed  into  Register  B2.  Numbers  5  and  6 
are  sorted  and  placed  in  Register  C2  and  numbers  7  and  8  sorted  and  placed  in 
Register  D2. 

While  the  four  numbers  are  being  sorted  and  placed  into  Registers  C2 
and  D2  the  four  numbers  ir  Registers  A2  and  B2  are  sorted  and  placed  in 
Register  A3.  Register  A3  can  hold  four  numbers.  Sorting  is  carried  out  by 
comparing  the  first  number  entered  into  Register  A2  with  the  first  number  that 
was  entered  into  Register  B2.  The  clocking  to  the  register  containing  the 
larger  of  the  two  numbers  (or  smaller,  if  that  is  the  order  of  the  sort) 
is  enabled  so  that  the  number  is  transferred  into  Register  A3  and  is  replaced 
by  the  second  number  transferred  into  Register  A2  or  B2.  A  comparison  then 
takes  place  between  the  number  retained  by  the  inhibited  clock  register  and 
the  number  replacing  the  one  just  transferred  out. 

This  sequence  of  comparison  and  shift  continues  until  all  four  numbers 
in  Registers  A2  and  B2  are  transferred  into  Register  A3  leaving  Registers  A2 
and  B2  empty  for  the  ninth,  tenth,  eleventh  and  twelfth  numbers. 
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Sorting  and  storing  the  numbers  continues  through  the  remaining  stages 
with  the  length  of  the  storage  registers  doubling  at  each  stage  until  the 
required  quantity  of  numbers  are  held  in  two  registers.  The  numbers  in  these 
two  registers  are  then  compared  once  more  and  then  transferred  into  a  parallel 
to  serial  converter  before  transferring  off  the  chip  on  a  single  serial  line. 

A  three  phase  clock  system  may  be  used  for  the  registers  and  by  making 
03  more  positive  than  the  other  two  phase,  and  by  making  a  03  clock  line  unique 
for  each  of  the  registers;  each  register  can  be  either  enabled  or  inhibited  by 
switching  the  03  clock  lines  and  allowing  the  common  01  and  02  clock  lines 
to  continuously  run. 

Using  Equation  (5),  which  was  derived  previously  for  a  3-phase  clocking 
system,  and  assuming  an  unsorted  16-Bit  word  rate  of  f  =  1  MHz;  the  required 
DCCL  clock  rate  is 


fc  =  [06-1/3  +  2)]f0 
*  7.0  MHz  . 

In  September  1978,  a  half-adder  design  that  would  operate  comfortably  at 
a  clock  speed  of  7.0  MHz  was  not  available.  This  subsequently  resulted  in  the 
abandonment  of  this  approach  from  a  design  risk  standpoint.  One  year  later 
a  4-phase  ACD2  half-adder  was  operated  at  5.0  MHz  with  a  predicted  upper 
operating  limit  of  10  MHz.  If  Equation  (5)  is  modified  for  4-phase  operation, 
it  becomes; 

fc  =  [06-1/4  +  2)]f0  .  (6) 

Now,  for  a  unsorted  16-Bit  word  rate  of  f  =  1  MHz,  the  required  DCCD  clock 
frequency  is  only  5.75  MHz.  At  this  point  in  time,  this  sorting  technique 
appears  to  be  a  feasible  DCCL  option. 

6 . 3 . 2 . 2  DCCL  Sort  and  Merge  Technique  No.  2 

Investigation  of  a  second  technique  was  initiated  in  November  1978. 

This  approach  integrated  the  magnitude  comparison  and  the  sort  store  into 
single,  bit  size,  logic  blocks  as  shown  in  Figure  6-6. 
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Figure  6-6.  Block  diagram  of  a  sort  and  merge  algorithm 
with  the  features  of  allowing  the  bit  length 
of  the  numbers  being  compared  to  be  expandable 
in  the  row  direction  and  the  quantities  of 
sorted  numbers  to  be  expandable  in  the  column 
directions. 

The  description  of  this  sort  and  merge  algorithm  begins  by  assuming  that 
the  first  word,  a^,  has  been  entered  and  stored  in  the  first  row  of  logic 
blocks,  with  the  most-significant-bit  (MSB)  in  the  left  block.  The  next 
number,  b^ ,  enters  the  first  row  of  logic  blocks  and  the  MSB  of  b^  is  com¬ 
pared  with  a^  to  determine  which  has  the  higher  binary  value.  If  a^  Is 
equal  to  b.. ,  then  a  control  signal,  equal  to  a  binary  0,  is  generated 
and  transmitted  to  the  second-most-significant  logic  block,  this  binary  0 
level  indicates  that  no  determination  on  the  number  size  has  been  made  at  the 
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MSB's  and  a  comparison  now  has  to  be  made  at  the  second  or  lower  binary  level. 

In  addition  to  the  C°  -j  signal,  the  MSB  logic  block  will  generate  an  output  bit, 

S..,  that  is  of  the  same  binary  value  as  a,,  or  b^  (remember  that  a..  =  b..  in 
the  present  case).  This  S.  signal  is  transmitted  as  the  b^  input  to  the  next 
row  of  logic  blocks. 

For  the  case  when  a^  and  b^  do  not  have  equal  MSB  values,  the  control 
signal,  C9_j,  equal  to  a  binary  1,  is  generated.  This  C9  -j  signal,  a  second  control 
to  the  rest  of  the  logic  blocks  in  that  row,  indicating  that  no  further 
comparisons  are  necessary.  In  addition  to  the  Cj°-|Signal,  a  second  control 
signal  C'.  -|  is  generated  and  also  transmitted  to  the  remaining  logic  block  in 
the  row.  The  C!  1  =  binary  1  when  bi  =  binary  1  and  Cl_1  =  binary  0  when  ai  3 
binary  1.  If  a..  is  a  binary  1,  the  stored  number  a,,  in  that  row  is  output  as 
S.j  to  the  next  row  of  logic  blocks  and  b^  is  stored  for  the  next  comparison. 

However,  if  b^  is  binary  1,  the  b.  number  is  output  from  the  logic  block  row 
to  the  next  lower  row  and  a.,  is  restored  for  a  subsequent  comparison. 

A  truth  table  of  the  four  input  and  output  terms  of  a  logic  block  are 
shown  in  Table  6-2.  The  max-terms  and  min-terms  derived  from  Table  6-2  can 
be  reduced  to  the  four  logic  equations  (7  through  10)  shown  below. 
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Table  6-2.  Signal  and  control  lines  for  a  single  sort  and 
merge  logic  block. 
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To  implement  these  four  equations,  each  input  or  control  term  must  have 
a  fan-out  of  up  to  five.  Fortunately  we  can  use  some  of  the  inherent  DCCL 
features  to  reduce  this  requirement.  As  an  exmaple,  if  a^  and  b^  are  the  two 
inputs  to  a  DCCL  half-hadder,  the  sum  output  will  be  a^  +  a^b^  which  is  a 
major  part  of  the  C°  expression,  while  the  carry  output  from  the  same  half-adder 
will  be  the  a^b^,  part  of  the  f^  expression. 

Figure  6-7  shows  the  logic  design  for  the  new  algorithm  that  integrates 
the  magnitude  comparison  and  the  sort  store  technique  described  above.  The 
Figure  6-7  DCCL  cell  requires  five  charge  refresh  cells,  five  charge 


fan-out  cells,  one  half-adder,  eight  AND  gates  and  six  OR  gates.  A 

computer  analysis  shows  that  at  a  CCD  gate  density  of  3.4  mils/gate,  this 

71  gate  SAM  design  will  require  an  area  of  15.5  x  15.5  mils  and,  a  2  MHz 

2 

clock  rate,  will  dissipate  143  microwatts  of  CV  f  power. 

One  problem  with  this  sorting  approach  is  how  to  empty  the  sorted  numbers 
from  the  SAM  chip  and  how  to  load  the  first  unsorted  words  into  the  chip.  These 
two  difficulties  can  be  overcome  by  simply  keeping  the  two  control  line  inputs 
to  the  most  significant  bit  at  C0^  ^  =  1,  C1.  j  =  0  when  reading  out  the  chip. 
The  C0^  =  1  indicates  that  no  further  comparisons  are  necessary  and  the 
C‘.j  i  =0  indicates  that  the  stored  data  must  be  outputted  to  the  next 
lower  level.  The  output  stored  data  is  replaced  by  a  vector  that  is  equal 
to  the  largest  positive  number.  The  C0^  i  =  1,  CV  ^  =  0  controls  are  shifted 
down,  clocking  the  sorted  numbers  out  of  the  chip. 

The  first  unsorted  number  of  the  next  block  immediately  follows  the 
vector  which  is  now  the  a.  in  store.  Since  the  unsorted  number  will  be 
smaller  or  equal  to  a..,  the  vector  will  be  outputted  to  the  next  lower  level. 

From  the  logic  diagram  shown  in  Figure  6-7,  it  can  be  seen  that  there 
are  sixteen  places  where  the  signal  paths  (CCD  channels)  cross.  This  large 
number  of  crossovers  and  fan-outs  required  result  in  a  large  area  per  word 
requirement.  For  a  reasonable  die  size,  the  number  of  chips  required  to 
implement  this  sorting  technique  is  large  in  comparison  with  other  sort  and 
merge  approaches.  Consequently,  this  approach  was  abandoned  in  January  1979. 

6. 3.2. 3  DCCL  Sort  and  Merge  Technique  No.  3 

Investigation  of  a  radix  exchange  algorithm  for  the  DCCL  Sort  and  Merge 
application  was  initiated  in  January  1979.  In  the  radix  exchange  algorithm, 
a  M  long  list  of  unsorted  N-bit  wide  words  are  sorted  by  decision  logic  into 
two  registers,  depending  on  the  binary  value  of  the  most  significant  bit 
as  shown  in  Figure  6-8.  In  order  to  keep  the  two  sorted  stacks  in  their 
sorted  positions  during  future  sort  separations,  a  delimiter  binary-one  bit 
is  added  t.o  the  left  of  the  most-significant-bit  on  the  first  and  last  words 
of  each  stack.  At  the  end  of  the  input  word  list,  two  lists,  resulting 
from  the  first  (MSB)  sort,  are  separated  into  their  respective  registers. 
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Figure  6-8.  First  step  of  the  radix  exchange  sort  and 
merge  algorithm. 


The  clocks  to  register  2  (MSB  *  0)  are  inhibited  while  those  rerouted  through 
the  decision  logic  (Figure  6-9).  On  this  second  pass,  the  words  are  sorted 
Into  registers  3  and  4,  depending  on  the  binary  value  of  the  second-most- 
slgnlflcant-blt.  Again,  delimiter  bits  are  added  to  the  first  and  last  words 
In  each  stack.  The  clocks  to  register  1  are  then  Inhibited  and  reverse  order 
clocks  to  register  2  are  enabled,  emptying  register  2  through  the  decision 
logic.  Into  registers  3  and  4.  Again,  at  the  end  of  the  sort,  delimiters  are 
added  to  the  first  and  last  words  in  each  stack. 

The  clocks  to  register  4  are  then  inhibited  and  reverse  clocks  to  register 

3  are  enabled,  emptying  through  the  decision  logic  into  the  now  empty  registers 

1  and  2  (Figure  6-10).  On  this  third  pass,  the  words  are  sorted  depending 
upon  the  binary  value  of  the  third-most-significant-bit.  When  a  delimeter-bit 
Is  reached,  the  clocks  to  register  3  are  inhibited,  new  delimiter  bits  are 
added  to  the  stacks  In  registers  1  and  2  and  the  reverse  clocks  to  register 

4  are  then  enabled.  The  words  In  register  4  then  empty  through  the  decision 

logic  until  a  delimiter-bit  is  reached.  At  that  stage,  the  clocks  to 
register  4  are  Inhibited  and  the  reverse  clocks  to  register  3  enabled.  This 
process  continues  until  all  words  have  been  sorted  according  to  the  binary 
value  of  each  bit. 

The  long  registers  required  for  this  sorting  technique  are  configurable 
as  SPS-CCD  arrays  since  It  Is  required  that  the  data  flow  through  these 
registers  be  bi-directional,  the  necessity  for  a  bi-directional  (LIFO  as 
well  as  FIFO)  SPS  memory  design  emerged.  A  LIFO  capability  satisfies  the 
Sort  and  Merge  application  needs  and  a  normal  SPS  (LIFO)  capability  provides  for 
recirculation  In  one  stack  while  data  is  being  inserted  or  removed  from  other 
stacks  (a  feature  which  allows  for  Increasing  the  length  of  the  sorted 
numbers).  To  provide  for  data  recirculation,  it  was  necessary  to  include 
a  reclrculatlng/lnput  switch.  Also,  to  permit  data  read-out  from  either  end 
of  the  SPS  It  was  necessary  to  have  two  charge  comparator  circuits  as  shown 
In  Figure  6-11 . 
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Figure  6-9.  Second  step  of  radix  exchange  sort  and 
merge  algorithm. 
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The  following  paragraphs  describe  the  operation  of  a  FIFO/LIFO  SPS 
memory  designed  for  the  radix  exchange  sort  and  merge  algorithm  described 
in  Section  6. 3.2. 3. 

6. 3.2. 4.1  Interlaced  FIFO  SPS  Design 

One  of  the  inherent  problems  encountered  in  the  designing  of  a  high 
density  SPS  memory  is  the  serial  to  parallel  and  parallel  to  serial  transfer 
interface.  The  use  of  a  four-phase  clocking  scheme  permits  the  use  of  the 
same  polysilicon  level  for  all  serial /parallel  transfer  gates,  as  shown  in 
Figure  6-12. 

The  intermediate  solution  to  increasing  the  density  of  a  SPS  and  still 
keeping  within  the  same  design  and  processing  rules  is  to  reduce  the  storage 
areas.  Significant  reduction  of  the  gate  length,  L,  is  limited  by  the 
design  and  processing  rules  for  shrinkage  and  alignment.  The  ability  to 
reduce  the  storage  area  is,  therefore,  limited  to  a  reduction  in  the  gate 
width,  W. 

As  W  is  decreased,  the  unused  space,  S,  between  the  parallel  registers 
increases.  To  make  use  of  this  space,  the  novel  interlace  technique,  shown 
in  Figure  6-13,  can  be  used.  As  can  be  seen  from  Figure  6-13,  another  set  of 
parallel  registers  have  been  inserted  between  the  original  ones. 

Interlacing  of  the  parallel  data  is  achieved  by  first  transferring  data 
from  the  input  serial  register  into  parallel  registers  when  charge  packets 
are  under  pi  gates.  Following  this,  the  input  register  is  loaded  again  and 
the  serial  to  parallel  transfer  performed  when  the  data  charge  packets 
are  under  the  p3  gates  thereby  completing  the  serial  to  parallel  interleaving 
operation . 

The  serial  to  parallel  transfer  clock  pulses  are  spaced  16.5  high  speed 
clock  cycles  apart  as  can  be  seen  from  the  timing  and  potential  flow  diagrams 
shown  in  Figured  6-14  and  6-15. 

Referring  to  Figure  6-15  the  charge  transferred  under  the  parallel  phase 
1  gate  spreads  back  under  the  transfer  gate  during  the  time  that  the  charge 
under  the  phase  3  serial  gate  transfers  under  the  phase  1  parallel  gate.  This 
has  no  adverse  effect.  The  parallel  to  serial  transfer  is  more  difficult 
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Figure  6-15.  Potential  diagram  showing  transfer  of  charges  from 
the  serial  to  parallel  register  in  the  interleaved 
SPS  memory. 
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since  we  have  to  consider  the  mode  when  there  are  charge  packets  in  adjacent 
last  0  4  parallel  gates.  Referring  to  Figure  6-16,  parallel  charge  packets 
are  first  transferred  to  the  serial  register  when  the  serial  01  clocks  are 
at  their  positive  level.  Next,  the  parallel  charge  packets  transferred 
to  the  serial  register  when  the  serial  03  clocks  are  at  their  positive  level 
thereby  completing  the  parallel  to  serial  de-interleaving  operation. 

In  order  to  accomplish  this  transfer  successfully,  we  employ  a  large  tem¬ 
porary  storage  gate  area  (TG2)  that  is  connected  to  a  dc  voltage.  In 
addition  to  a  second  storage  area  (TG3)  connected  to  a  clock  line  as  shown 
in  Figures  6-16  and  6-17.  The  transfer  from  the  parallel  register  to  the 
serial  register  is  accomplished  through  the  TG4  gate.  The  purpose  of  the 
additional  TG3  temporary  storage  gate  is  to  reduce  the  distance  and  time  of 
charge  transfer  from  TG2  to  the  01  or  03  serial  register  gate. 

The  charge  packet  may  transfer  out  of  and  back  into  the  TG4  storage 
gate  in  a  similar  manner  to  the  serial -to-paral lei  transfer.  Again,  this 
will  have  no  adverse  effect.  The  timing  relationship  between  the  four  serial 
clocks  and  the  transfer  clocks  is  shown  in  Figure  6-18. 

6. 3. 2. 4. 2  Interlaced  FIFO/LIFQ  SPS  Memory  Design 

The  design  of  an  SPS  memory  capable  of  operating  as  either  a  FIFO  or 
LIFO  required  a  scheme  for  changing  the  direction  of  charge  transfer  both 
within  the  registers  and  also  between  parallel  and  serial  registers. 

The  clock  waveforms  of  the  serial  register  4-phase  overlapping  clocks 
are  shown  in  Figure  6-19  along  with  the  clocks  phases  necessary  to  transfer  in 
the  opposite  direction.  By  examining  these  waveforms  it  can  be  seen  that  to 
change  the  direction  of  the  charge  transfer,  it  is  only  necessary  to  inter¬ 
change  two  of  the  clock  phases;  as  an  example,  gates  attached  to  01  and  03 
in  the  forward  direction  are  changed  to  03  and  01  respectively.  Gates  that 
are  attached  to  clock  lines  02  and  04  are  unchanged. 

The  serial  to  parallel  transfer  described  above  for  the  SPS  memory  can¬ 
not  be  used  for  the  LIFO  since  it  will  not  enable  the  reverse  transfer  from 
parallel  to  serial.  Therefore  it  is  necessary  to  use  the  same  serial-to- 
parallel  transfer  structure  described  for  the  interlaced  SPS  for  both  serial- 
to-parallel  and  paral lei -to-serial  transfers  for  the  LIFO  memory. 
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Figure  6-17.  Parallel  to  serial  interface  for  the  interleaved  SPS 
structure. 


ming  diagram  for  the  serial  register  and  the  parallel 
i-serial  transfer. 


When  the  data  transfers  from  the  serial  to  parallel  registers,  either 
at  the  top  in  the  normal  SPS  read-in,  or  at  the  bottom  in  the  LIFO  read-out 
mode,  the  transfer  clocking  arrangement  has  to  be  changed  as  shown  in  Figure 
6-20.  The  three-gate  transfer  is  changed  to  a  single  gate  transfer  and  two 
parallel  clocks.  The  01 ,  02,  03  and  04  parallel  clocks  are  switched  to  02, 
01,  04  and  03  respectively. 

In  switching  the  direction  of  transfer,  we  do  not  have  to  increase  the 
number  of  storage  gates,  so  no  gaps  occur  in  the  data.  However  direction 
of  charge  transfer  can  only  occur  during  the  period  when  both  the  01  and  02 
parallel  clocks  and  02  or  04  serial  clocks  are  at  their  positive  levels. 

In  order  to  read  data  out  of  the  input  serial  register,  we  add  an  extra 
output  port  to  the  first  01  serial  input  gate  as  shown  in  Figure  6-21. 

Two  additional  04  clock  lines  are  required,  these  both  have  the  same 
phase  and  voltage  as  the  standard  04  serial  clocks  but  in  addition  must  have 
the  capability  of  being  switched  to  ground  independently.  The  gates  attached 
to  these  additional  04  clock  lines  (04A  and  04B)  act  as  charge  barriers  when 
the  lines  are  switched  to  ground. 

The  typical  input  clocking  and  gating  for  the  SPS  memory  is  shown  in 
Figure  6-20  and  the  same  physical  structure  with  the  clocking  switched  to 
LIFO  read-out  mode  is  shown  in  Figure  6-21.  Note  that  when  the  clocks  were 
switched  from  write  to  read  the  01  and  03  clock  lines  were  reversed  to 
change  the  direction  of  charge  flow  as  described  previously. 

The  design  of  this  FIFO/LIFO  memory  was  performed  and  incorporated  into 
a  test  mask  cell.  Program  funding  and  schedule  constraints  did  not  however, 
permit  the  functional  evaluation  of  this  design. 
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igure  6-21.  Input  and  output  data  flow  block  diagram  for  the  Ufl 


7.0  DCCD  WAFER  PROCESSING 


7.1  INTRODUCTION 

Processing  of  digital  charge-coupled  devices  (DCCD's)  has  undergone  an 
extensive  evolution  over  the  past  5  years  affecting  every  step  of  the 
processing  technology.  Key  objectives  of  CCD  process  development  were 
increased  circuit  density,  not  only  to  reduce  the  number  of  chips  needed 
to  build  a  complex  signal  processing  system,  but  to  enable  higher  clock 
rates  as  well.  Every  aspect  of  MOS  processing  technology  was  involved; 
this  resulted  in  major  changes  in  processing  methods  and  device/circuit 
design  and  layout  methods. 

A  summary  of  CCD  Processing  Laboratory  accomplishments  during  the  1975  - 
1979  period  include: 

1.  Development  of  a  process  technology  that  resulted  in  successful 
fabrication  of  key  DCCD  building  blocks  (e.g. :  full  adder;  half 
adder;  refresh  cell). 

2.  Produced  several  generations  of  DCCD  chips  that  performed  in  a 
uniform,  repeatable  manner. 

3.  Developed  standardized  processing  procedures  and  controls,  also, 
in-process  monitoring,  that  resulted  in  tighter  device  tolerances 
with  resulting  increases  in  circuit  density  and  performance. 

4.  Simplified  and  calibrated  major  processing  steps;  this  included 
field  and  gate  oxide  growth;  polysilicon,  nitride  and  metallization 
depositions;  dopant  implantation  and  drive  procedures;  photoresist 
coating,  development  and  removal  methods;  and  plasma  etching  of 
polysilicon  and  nitride  films. 

5.  Increased  the  level  of  automation  of  key  processing  steps;  this 
included  gate  oxide  growth,  photoresist  coating,  and  plasma 
etching  of  poly  and  nitride  films. 

7.2  PROCESSING  EVOLUTION 

Current  processing  technology  used  to  produce  DCCD's  include  a  long 
list  of  processing  changes  that  were  the  result  of  thousands  of  man  hours 
of  wafer  processing,  pains-taking  test  and  evaluation,  subsequent  design 


rule  changes,  and  equipment  modification.  Some  of  the  more  important 
processing  changes  that  permitted  fabrication  of  DCCD  circuit  building 
blocks  included: 

1.  Conversion  from  P-channel  to  N-channel  devices,  as  a  means  of 
increasing  device  uniformity  and  higher  clock  frequencies. 

2.  Improving  circuit  yields  by  process  step  simplification. 

3.  Rejecting  the  polysilicon  gate/metal  gate  technology  in  favor 
of  a  dual  poly-gate  technology. 

a 

4.  Employing  a  uniform  1 ,000A  gate  oxide  under  poly  I  and  II  gate 
levels. 

5.  Reducing  field  oxide  thickness  to  10,000A,  to  insure  satisfactory 
step  coverage  by  metal  conductor  patterns. 

6.  Developing  a  plasma  etch  technology,  to  insure  high  density  poly¬ 
silicon  gate  structures. 

7.  Doping  of  Poly  I  and  II  level  films  with  Phosphorus,  rather  than 
employing  both  N  and  P  type  dopants,  with  possible  contamination 
of  thin  gate  oxides  by  Boron. 

8.  Standardizing  metal  sintering  temperature  at  400°C,  for  significantly 
reduced  Qss- 

9.  Employing  TEOS  in  lieu  of  SILOX  or  thermally  grown  S i 0^  for  smooth 
step  coverage  by  metal  conductor  lines. 

10.  Improved  contact  hole  etching  techniques,  enabling  a  significant 
reduction  in  both  poly  and  metal  line  widths. 

11.  Identifying  the  depletion  barrier  or  BUMP  phenomenon,  occurring 
at  the  edge  of  Poly  I  structures;  this  structural  anomaly  reduces 
gate  control  over  charge  packets,  resulting  in  a  series  of  barriers 
between  gates  that  may  reduce  the  transfer  efficiency  of  CCD 
devices . 

12.  A  P-type  buried  channel  lot  was  fabricated  in  1977  that  demonstrated 
buried  channel  operation  of  a  10-bit  shift  register.  This  verified 
the  changes  made  to  the  DP-3  process  to  provide  buried  channel 
operation. 
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PROCESS  EVOLUTION  _  1975 _ 1979 


1.  Metallization 


Line  Width 

7. 5u 

5.0u 

Thickness 

20.000A 

io.oooA 

Metallization  System 

Pure  Aluminum 

4%  Cu-doped  A1 

Substrate  Heating 

Deposited  at  Ambient  (25°  C) 

Deposited  at  300°C 

2. 

Contact  Holes 

7.5u  x  7.5u 

2.5u  x  5.0u 

3. 

Line  Separation 

7. 5u 

1. 27u 

4. 

Oxide  Cut  (windows) 

2. 5u 

1. 27u 

5. 

Poly  Gate  Overlap 

2.  5u 

1 . 9u 

6. 

Gate  Levels 

Polysilicon  and  Metal 

Both  gate  levels 
are  Polysilicon 

7. 

Gate  Oxides 

Poly  Level  -  1,000A 

Metal  Level-  2,000A 

Gate  oxides  for 
both  levels  are 

1 ,000 A  or  less. 

8. 

Etching  Methods 

Wet  Etching  Processes 

Only 

Plasma  Etching  used 
for  Poly;  Nitride  - 
Wet  Etch  used  for 
Oxides;  Metal 

9. 

Field  Oxide 

15,000A 

7,000  -  8, 000 A  THK. 

10. 

Channel  Definition 

15,000$  Field  Oxide  Cut 

Channel  Stops  used 
to  define  channel 

11. 

CCD  Technology  Selected 

P-Channel 

N-Channel 
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A  comparison  can  be  made  of  the  basic  processing  that  was  used  to 
produce  DCCD  devices  in  1975  with  current  processing  methods.  A  comparison 
of  resulting  device  parameters  can  also  be  used  as  a  gage  of  the  progress 
made  during  this  5  year  period. 

7.3  PROCESSING  HISTORY 

The  very  first  wafer  lots  produced  in  1975  were  designated  N1  through 
N4.  The  circuit  under  development  at  that  time  was  the  full  adder,  which 
used  a  floating  gate  to  control  one  of  two  paths  between  the  adder  input 
and  the  SUM  output.  Lots  N1  and  N2  were  fabricated  with  1 ,000A  gate  oxides 
under  the  polysilicon  electrodes.  Initial  objectives  were  to  determine 
whether  the  full  adder  structure  could  be  processed  as  designed  and  detect  any 
possible  failure  mode.  There  was  some  concern  over  the  thickness  of  the 
field  oxide  (15.000A),  as  a  high  "step"  was  formed  upon  etching  the  channel 
cut.  Thin  photoresist  covering  the  edge  of  this  cut  could  permit  the  form¬ 
ation  of  pinholes  in  the  oxide  during  contact  hole  etching. 

Lots  N3  and  N4  employed  the  "standard"  DCCD  process.  These  lots  basic¬ 
ally  evaluated  a  combination  of  thermal  oxide  protective  film  in  combination 
with  a  thin  phosphosilicate  glass,  to  prevent  the  formation  of  pinholes. 

These  two  lots  had  other  problems  associated  with  the  operational  character¬ 
istics  of  the  full  adders;  both  poly-gate  FETs  and  metal  gate  FETs  had  severe 
threshold  voltage  problems;  this  included  high  Vt  values  for  the  poly-gate 
FETs  and  shorted  metal-gate  FETs.  It  was  suspected  at  that  time  that  both 
boron  and  phosphorus  had  completely  penetrated  the  gate  oxides,  producing 
the  high  thresholds  and  shorts.  The  substrate  material  used  for  the  N1 
through  N4  lots  was  N-type  < 1 00 >  1  -  3  ohm-un. 

NAV-5  and  NAV-6  were  then  produced;  both  lots  had  500A  of  thermal  oxide 
under  the  floating  gate  and  2,000A  thermal  oxide  between  the  floating 
poly-gate  and  the  "carry"  (metal)  gate.  Surface  potential  plots  of  the 
metal  and  poly  gate  test  FETs  of  NAV-5  indicated  that  the  phosphorus  used 
to  dope  the  poly  gate  had  penetrated  the  500A  gate  oxide.  NAV-6  received 
a  lower  temperature  dopant  drive-in  schedule;  V^  values  looked  normal. 
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NAV-7  and  NAV-8  were  produced  during  May  1975.  Lot  7  employed  1 ,000A 
gate  oxides  and  did  not  indicate  phosphorus  penetration  through  the  gate 
oxide.  Lot  8  was  a  departure  from  standard  processing  procedure  as  a 
complex  oxide/nitride  gate  dielectric  was  employed.  Unfortunately,  this 
lot  did  not  produce  testable  devices  as  the  etching  of  the  FET  source- 
drains  also  attacked  the  polysilicon  covering  the  CCD  devices. 

NAV-9  was  produced  with  a  600A  gate  oxide.  Results  weh?  indeterminate 
due  to  circuit  functional  problems.  NAV-10  and  NAV-11  were  then  fabricated, 
using  the  identical  process  sequence  of  NAV-9.  These  lots  employed  both  all 
wet  and  all  dry  oxides  as  a  means  of  isolating  the  floating  gate  element; 
none  of  the  combinations  selected  cured  the  charge  accumulation  problem, 
thereby  indicating  the  necessity  for  a  design  change. 

7.3.1  Introduction  of  the  "DP"  Mask  Series 

In  order  to  overcome  difficulties  with  the  full  adder’s  floating  gate 
configuration,  several  device  structures  were  conceived  that  allowed  discharge 
or  presetting  of  the  floating  gate  itself.  Before  these  new  device  designs 
were  included  in  the  DP-1  mask  set,  a  small  test  device  mask  set,  designated 
DP-0  was  designed,  which  permitted  verification  of  the  floating  gate  FET 
discharge  concept. 

The  floating  gate  amplifier  was  subsequently  modified  to  incorporate 
a  FET  discharge  device  for  its  floating  gate  structure.  The  DP-0  mask  set 
was  generated  using  the  slip  mask  technique.  This  technique  was  pursued  in 
order  to  reduce  mask  costs  by  allowing  several  mask  levels  to  be  combined 
on  one  reticle.  The  circuits  included  on  DP-0  required  6  mask  levels;  this 
necessitated  two  reticles  consisting  of  3  levels  each.  It  must  be  realized 
that  this  technique  decreased  the  number  of  usable  circuits,  since  during 
the  processing  steps,  some  potential  circuit  positions  are  not  available 
as  a  result  of  slip  mask  operation.  As  DP-0  was  created  to  evaluate  basic 
circuit  concepts,  it  was  agreed  that  20  die  positions  were  sufficient  to 
verify  design  concepts.  Approximately  30  circuits  were  obtained  on  the 
available  2-inch  wafers. 
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Two  DP-0  lots  were  initiated  during  this  period  (September  1975). 

DP-0-1  had  both  oxides  (on  either  side  of  the  poly  floating  gate)  grown 
in  a  wet  atmosphere,  while  DP-0-2  wafers  had  both  oxides  surrounding  the 
floating  gate  grown  in  a  dry  atmosphere.  Oxide  thickness  under  the  poly¬ 
silicon  electrode  was  1.000A,  while  the  oxide  under  the  metal  electrode  was 
2.000A.  The  field  oxide  thickness  was  15.000A.  The  polysilicon  film  was 
phosphorus  doped  and  was  covered  by  a  "poly  protect"  nitride  film.  Both 
poly  and  nitride  films  were  plasma  etched.  PMOS  source-drains  were  doped 
by  diffused  boron.  Pure  aluminum  was  used  as  metallization;  sintering  was 
accomplished  at  450°C  in  a  nitrogen  ambient  followed  by  sintering  at  the  same 
temperature  in  hydrogen. 

Four  DP  series  lots  were  fabricated  during  the  months  of  September  - 
October  1975.  Basic  variations  were  made  in  gate  oxide  thickness  for  both 
poly  and  metal  gate  structures  as  follows: 

LOT  DESIG .GATE  OXIDE  LEVEL  THICKNESS  WET/DRY  THERMAL  OXIDE  TEMPERATURE 


DP-0-1 

Poly  Level 

1.000A 

Wet 

920°C 

Metal  Level 

2 ,000A 

Wet 

920°C 

DP-0-2 

Poly  Level 

1,000 A 

Dry 

1075°C 

Metal  Level 

2.000A 

Dry 

1075°C 

DP-0-3 

Poly  Level 

1,000 A 

Dry 

1075°C 

Metal  Level 

2,000A 

Wet 

920°C 

DP-0-4 

Poly  Level 

1.000A 

Wet 

920°C 

Metal  Level 

2.000A 

Wet 

920°C 

The  use  of  a  slip  mask  on  DP-0  lots 
gathered  on  the  DP-0  lots  were  reflected 

DP-0  circuits  indicated  that  the  addition 

proved  to  be  successful. 

in  DP-1  mask  designs, 
of  the  discharge  FET  to 

Information 

Tests  of 

the  floating 

gate  structure  insured  that  floating  gate  performance  would  no  longer  be 
susceptible  to  charge  accumulation  on  the  gate  structure. 
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DP-0-5  was  produced  on  <111>  material;  this  material  was  selected  as 
a  means  of  raising  the  field  oxide  threshold  voltages.  Substrate  bias  was 
used  on  previous  lots  fabricated  on  <100?  material;  in  order  to  avoid  this 
circuit  complication,  < 1 1 1 >  material  was  used  in  the  hopes  that  it  would 
produce  devices  with  higher  threshold  voltages.  An  additional  oxidation 
step  was  included  following  polysilicon  gate  definition.  This  oxidation 
step  was  designed  to  prevent  source-to-drain  shorts  resulting  from  poly¬ 
gate  definition,  standard  PMOS  source-drain  and  diffusion  and  drive  procedures 

Lot  DP-0-6,  completed  in  December  1975,  did  not  provide  useful  devices 
due  to  metal  discontinuities  at  many  polys i 1 icon-to-metal  contact  regions. 

During  the  initial  months  of  1976,  several  DP-1  lots  were  completed 
and  tested.  In  general,  the  process  used  for  these  lots  followed  the  same 
step  sequence  employed  on  the  DP-0  lots.  A  "poly  protect"  mask  was  added 
to  the  processing  sequence,  to  prevent  etching  the  field  oxide  under  the  poly¬ 
silicon  structures.  The  resulting  undercut  of  the  poly  lines  produced 
steps  that  were  difficult  to  cover  by  the  Aluminum  interconnects.  In  addition 
TEOS  was  again  used  as  a  protective  film  t„  cover  poly  structures  on  lot 
DP-1-2;  lots  DP-1-1  and  3  were  fabricated  without  a  TEOS  film.  Electrical 
tests  performed  on  these  lots  indicated  a  large  number  of  breaks  in  the  metal 
conductors  where  they  traversed  combined  field  oxide  and  polysilicon  edges. 

DP-1-5  and  DP-1-6  were  therefore  fabricated  with  a  TEOS  film  over  poly 
gate  structures.  A  second  processing  change  increased  the  gate  oxide  thick¬ 
ness  under  the  metal  gates  to  3.000A.  Subsequent  electrical  testing  of  these 
lots  indicated  metal  gate  oxide  contamination;  the  decision  was  then  made  to 
process  the  following  lots  without  a  TEOS  film:  DP-1-7;  lot  DP-1-9  was 
produced  as  a  divided  lot,  having  no  TEOS  film;  however,  the  lot  was  divided 
so  that  2  of  the  4  wafers  were  fabricated  with  a  poly  protect  mask.  The 
OP-1-9  lot  produced  devices  that  provided  needed  test  results.  Information 
acquired  was  used  in  the  redesign  of  the  DP-1  mask  set;  also,  for  the  DP-2 
mask  set,  which  contained  an  8  +  8  bit  adder,  a  3  x  3  bit  multiplier  and  a 
4+4  bit  adder  employing  the  half  adder  design.  DP-1-10  was  fabricated 
during  June  1976,  again  employing  a  TEOS  film  over  the  polysilicon  gates  and 
supporting  structures.  As  the  DP-1-10  mask  set  included  the  circuit  correc¬ 
tions  discovered  during  tests  on  earlier  lots,  functioning  multiplier  and  adder 
arrays  were  obtained. 
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7. 3. 1.1  The  Double  Polysilicon  Process 

During  the  month  of  August  1976,  two  DP-2  lots  (DP-2-1  and  DP-2-2)  were 
started  using  the  new  double  poly  process.  This  required  some  modification 
of  the  DCCD  process  sequence.  Although  the  poly-metal  gate  process  was  con¬ 
sidered  a  simpler  process,  problems  with  metal  step  coverage  were  sufficiently 
severe  to  necessitate  conversion  to  the  new  processing  approach.  As  the 
new  DP-2  series  was  fabricated  with  slip  masks,  alignment  problems  became  an 
over-riding  consideration  in  producing  useful  lots.  Although  the  mask 
supplier  provided  TRW  with  a  new,  more  closely  aligned  set  of  slip  masks, 
the  problem  of  using  the  slip  mask  set  continued.  The  initial  DP-2  lots 
produced  with  these  masks  necessitated  special  alignment  procedures;  the 
circuits  were  aligned  by  ignoring  the  alignment  targets  and  aligning  the 
contact  hole  mask  to  the  polysilicon  structures  already  on  the  wafers,  as 
well  as  the  observable  diffused  areas. 

Tests  performed  on  the  DP-2  8+8  adder  array  indicated  a  large  number 
of  Schottky  diode  occurrences  on  clock  lines;  these  unwanted  diodes  were 
injecting  charges  into  the  channels.  An  investigation  of  this  occurrence 
showed  that  the  diodes  occurred  where  an  aluminum  contact  was  made  to  poly¬ 
silicon  over  a  1,000A  channel  oxide.  The  large  number  of  potential  aluminum- 
to-poly  contact  failures  on  the  8+8  array  made  it  impossible  to  find  a 
failure-free  device  for  functional  tests.  The  full  adder  test  cell  also 
suffered  from  the  same  aluminum-to-poly  contact  problem. 

Additional  process  modifications  were  made  during  this  period  (October  1976) 
to  determine  the  gate  oxide  thickness  relationship  between  Poly  I  and  II 
gate  levels;  DP-2-3  was  a  split  lot, with  half  the  wafers  having  1, 000/2, 000A 
gate  oxides,  while  the  other  half  were  made  with  1 ,000/3, 000A.  Results  as 
to  which  oxide  combination  produced  the  most  usable  devices  were  inconclusive, 
due  to  alignment  problems  with  the  slip  mask  set,  which  produced  unusable 
devices.  Reoccurring  problems  forced  the  decision  to  have  the  DP-2  mask  set 
remade  as  a  conventional  set,  with  a  separate  reticle  for  each  mask  layer. 

A  variety  of  experiments  were  performed  to  eliminate  contact  problems,  with 
heavy  concentration  on  selecting  the  most  appropriate  sintering  temperatures. 
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The  new  conventional  mask  set  for  DP-2  circuits  was  obtained  during 
December  1976  and  was  designated  as  DP-2A.  This  new  mask  set  was  used  to 
produce  lot  DP-2A-1.  The  nitride  step  normally  used  to  define  Poly  II  was 
deleted.  The  Poly  II  gate  film  was  therefore  doped  during  the  PMOS  source- 
drain  diffusion  sequence.  Low  voltage  thresholds  obtained  when  this  lot 
was  tested  were  attributed  to  boron  penetration  through  the  gate  oxide 
during  the  TEOS  densification  cycle  that  occurs  at  higher  temperatures.  The 
poor  results  with  this  lot  necessitated  a  return  to  the  nitride  protective 
film  to  prevent  boron  penetration  of  the  gate  oxides  during  PMOS  source-drain 
diffusions.  During  this  period,  lot  DP-2A-2  was  completed.  This  lot  was 
produced  with  a  conventional  Arsenic-doped  Poly  II  layer.  DP-2A-2  had  diffi¬ 
culties  with  the  source-drain  diffusion  process.  The  sheet  resistance  of  the 
diffused  region  was  extremely  high  resulting  in  poor  device  operating  charac¬ 
teristics. 

7 . 3 . 1 . 2  The  Switch  to  N-Channel  DCCD  Technology 

Early  in  1977,  the  decision  was  made  to  fabricate  an  N-channel  DP-2A 
lot.  The  motivating  force  for  this  decision  was  improvement  of  the  data  rate 
through  the  arithmetic  arrays.  As  predicted,  tested  transistors  operated  in 
the  depletion  mode;  however  gate  voltage  versus  surface  potential  (VgVs) 
curves  were  recorded  by  applying  a  large  back-bias  to  the  substrate.  Clock 
voltages  were  computed  for  the  DP-2A-1N  N-channel  devices,  to  permit 
testing  of  the  various  arithmetic  arrays. 

Lot  DP-2A-1N  was  made  with  P-type  <100>  17  -  30  ohm-cm  wafers;  the 
process  was  similar  to  the  P-channel  process,  except  that  phorphorus  was 
used  for  the  source-drain  (NMOS)  diffusions.  Xj<.  p  depth  was  approximately 
lu.  Initial  results  indicated  surface  "channels"  that  were  created  at  the 
PN  junctions;  this  produced  high  leakage  diodes.  These  "channels"  were 
probably  caused  by  the  oxide  fixed  charge  or  Qss.  New  wafer  lots  were 
initiated  using  0.8  ohm-cm  material;  the  objective  here  was  to  prevent 
unwanted  channel  formation.  Lot  DP2-2N  employed  0.8  ohm-cm  material,  while 
lot  DP2-3N  used  3-5  ohm-cm  material.  Testing  of  these  new  lots  was  minimal; 
it  was  realized  early  in  the  testing  cycle  that  these  Irts  would  produce 
devices  with  excessive  surface  leakage  currents,  which  is  not  unexpected  for 
N-  channel  devices  without  channel  stops.  The  decision  was  made  to  concentrate 
on  DP3  designs,  which  were  specifically  designed  for  N-channel  operation. 
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7. 3. 1.3  The  DP3  Series 


The  DP3  series  of  lots  contained  an  8  x  8  multiplier  array,  a  16  +  16 
adder  array  and  various  test  cells  and  experimental  circuits.  In  April  1977, 
the  first  DP3  masks  were  received  and  device  lots  were  initiated.  A 
process  definition  effort  was  also  started  for  a  buried  channel  version  of 
DP3. 

Lots  DP3-1  and  DP3-2  were  completed  during  May  1977.  Problems  were 
experienced  with  polysilicon  definition,  due  to  photoresist  lifting  (poor 
adhesion).  The  process  was  improved  by  subjecting  the  sample  wafers  to  a 
30-minute  oxidation  at  900°C  previous  to  photoresist  coating;  also,  the  AZ111 
photoresist  developer  was  further  diluted.  Some  difficulties  were  experienced 
with  contact  hole  and  metal  line  definition  due  to  overetching.  Exposure 
time  and  development  were  adjusted  to  solve  this  problem.  Gate  voltage  vs 
surface  potential  plots  were  generated  for  several  positions  on  each  DP3-1 
wafer.  These  curves  were  satisfactory  for  the  oxide  thickness  used.  Gate 
shorts  were  found  to  be  quite  prevalent  on  this  lot,  so  a  complete  gate  short 
characterization  was  undertaken;  60  serviceable  die  locations  were  found  on  5 
wafers.  The  decision  was  made  to  define  all  MOSFET  channels  with  both  thick 
oxide  (10,000A)  and  channel  stops.  This  device  design  approach  prevented 
breakdown  between  the  inversion  charge  layer  and  the  channel  stop  in  MOSFET 
structures. 

Lot  DP3-3BC  was  initiated  during  this  period.  This  lot  was  fabricated 
using  a  buried  channel  process  (N-Buried  Channel)  that  was  similar  to  the 
surface  channel  process.  It  had  an  additional  photoresist  step  (implant 
mask)  and  a  B^  ion  implant  to  produce  the  buried  channel.  Three  different 
implants  were  used  for  this  lot: 

1.  1  x  1012  P  100  KEV  Boron  B]1 

2.  2  x  1012  @  100  KEV  Boron 

3.  1  x  1012  @  150  KEV  Boron  B^ 
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Upon  completion  of  DP3-3,  it  was  found  that  better  poly  and  metal 
resolution  was  obtained  with  a  10.000A  field  oxide  than  with  15.000A. 

It  was  concluded  that  added  field  oxide  film  thickness  would  be  necessary 
to  optimize  channel,  polysilicon  and  metal  line  definition.  DP3-3  provided 
poor  yields  determined  by  extremely  close  proximity  of  the  channel  to  adjacent 
diffusions,  which  resulted  in  punch-through  at  low  voltages.  Lot  DP3-4  was 
then  initiated  and  an  attempt  was  made  to  maximize  channel-to-di ffusion 
separation  by  changing  the  photoresist  process. 

Lot  DP3-3BC  was  completed  during  this  period  and  buried  channel  operation 
was  demonstrated. 

Lot  DP3-4  was  completed  during  the  month  of  July  1977.  Initial  field 
oxide  thickness  was  held  to  8.000A.  The  final  field  oxide  under  Poly  II  was 
7.000A.  As  a  result  of  previous  channel -to-di ffus ion  punch-through ,  the 
spacing  between  these  two  areas  was  increased  from  5.0u  to  7.5u,  requiring  a 
major  mask  set  revision.  Lots  made  with  the  revised  mask  set  were  designated 
PP-3A. 

The  first  lot  of  10  PP3A-1  were  completed  during  August  1977;  this  lot 
was  processed  withanB.OOOA  field  oxide  and  1  ,000A/2,000A  gate  oxides  for 
poly  I  and  II.  The  yield  obtained  from  this  lot  was  satisfactory  permitting 
characterization  of  the  arithmetic  cells. 

Two  additional  lots,  DP3A-2  and  3  were  completed  during  September  1077 
and  also  provided  satisfactory  device  yields.  A  new  lot  (PP3A-4)  was  started 
with  a  more  lightly  doped  8-12  ohm-cm  substrate  material  .  This  lot 
produced  devices  with  very  low  source  diode  breakdown  (punch-through) .  Lot 
PP3A-6BC  was  then  started  with  the  charnel  region  implanted  with  phosphorus 
(Pose:  1.5  x  10^  cm'  '  @  100  KE V ;  drive-in  was  36  HRS  9  107S°C). 

Lot  0P3A-6BC  was  completed  in  November  1977.  This  lot  employed  a  field 
oxide  of  15.000A  which  resulted  in  a  number  of  shorts  between  Poly  1  and  11 
levels.  The  transition  of  polysilicon  films  from  field  oxide  to  channel 
level  caused  fracturing  of  the  oxide  layers  between  the  two  poly  levels. 

The  resultant  topography  created  yield  problems  due  to  broken  metal  lines 
as  well. 


Lot  DP3A-7  was  completed  during  this  period  and  was  processed  using 
10  ohm-cm  substrate  material.  Differences  between  this  lot  and  its  pre¬ 
decessors  included  the  following: 

1.  FETs  were  fabricated  in  implanted  tubs  that  provided  a  4 
ohm-cm  background  for  the  NMOS  devices. 

2.  The  field  oxide  height  was  reduced  from  15,000A  to  lO.OOOA 
resulting  in  some  improvement  in  shorting  problems  seen 
with  the  previous  lots. 

An  in-depth  evaluation  of  key  device  characteristics  indicated  the 
presence  of  a  depletion  barrier  between  poly  1  and  II  levels,  this  so-called 
"bump"  occured  at  the  edges  of  poly  I  gate  structures  and  is  caused  by  the 
lifting  of  the  po 1  vs i 1  icon  film  during  growth  of  the  poly  II  gate  oxide. 

SEK  ohotographs,  represented  by  Figure  7-1,  illustrate  the  distortion  of  the 
poly  I  gate  structure  at  its  edges.  The  non-planari ty  created  at  the  edge 
of  the  gate  reduces  the  control  provided  by  the  gate,  resulting  in  a  series 
of  barrier  structures  that  reduce  the  transfer  efficiency  of  these  devices. 

In  essence,  a  succession  of  bumps  act  like  a  continuous  series  of  charge 
traps  that  can  affect  half-adder  or  full  adder  operation. 


Figure  7-1.  An  SEM  photograph  showing  the  vertical  tilt 
at  the  end  of  the  poly  I  gate. 
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Lot  DP3A-8  was  completed  in  December  1977;  this  lot  employed  gate 
oxides  of  800A  and  2400A.  This  large  difference  in  gate  oxide  thickness 
was  chosen  in  order  to  provide  a  large  difference  in  poly  1  and  II 
surface  potentials  in  an  attempt  to  overcome  the  race  conditions  described 
in  the  November  1977  Interim  Report. 

Tests  performed  on  DP3A-8  produced  working  devices,  however,  test 
results  obtained  from  the  cascaded  dual  half  adder  indicated  very  poor 
transfer  efficiency;  the  results  from  these  tests  indicated  that  the  change 
from  the  conventional  1000A/2000A  gate  oxide  configuration  was  a  poor  one, 
as  the  transfer  efficiency  of  these  test  devices  deteriorated  with  the 
800A  3400A  gate  oxide  thicknesses.  Based  on  these  initial  test  results, 
it  was  decided  to  suspend  further  testing  of  devices  and  DCCD  cells  on  this 
lot. 

Two  additional  lots,  DP3A-9,  -10  were  initiated  during  this  period; 
the  specific  goal  of  these  lots  was  to  eliminate  or  substantially  reduce  the 
"bump"  problem.  Upon  completion,  lot  DP3A-10  provided  very  uniform  VgVs 
plots,  indicating  a  4  volt  potential  difference  between  poly  1  and  II  curves. 
The  wafers  of  this  lot  were  specifically  processed  to  reduce  the  sice  of 
the  interface  barriers  between  poly  1  and  II  gates.  Testing  of  the  so-called 
charge  barrier  or  "bump"  indicated  a  spread  of  values  which  were  not 
significantly  better  than  was  obtained  from  lot  DP3A-B.  The  special  "bump 
removal"  processing  involved  removal  of  poly  I  material  along  the  edge  of 
each  gate  structure.  This  material  is  normally  undercut  and  therefore 
unsupported  during  chemical  etchiwyof  the  oxide  areas  to  be  occupied  by 
poly  II  gate  and  gate  oxide  films.  In  the  event  that  the  poly  I  unsupported 
edge  is  NOT  etched  awav,  the  subsequent  poly  II  gate  oxide  growth  fills  in 
the  undercut  region  and  causes  this  fringe  to  lift  up.  This  lifting  effect 
provides  increased  gate  oxide  thickness  along  poly  I  gate  edges,  therpby 
creating  the  intergate  potential  barrier  or  "bump." 

The  Nitride  Sandwich  Experiment 

A  second  approach  to  insure  reduction  or  possible  elimination  of  the 
interoate  barrier  was  devised  and  included  as  a  process  sequence  modifica¬ 
tion  on  lot  PP3A-9.  This  lot  employed  an  Si  -  SiO-,  dielectric  sandwich. 
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that  provides  an  absolutely  planar  dielectric  surface  under  poly  I  and 
II  gate  structures.  Some  significant  processing  problems  were  presented 
by  the  Si^  -  SiC^  dielectric  layers.  A  tough,  etch-resistant  silicon- 
oxj'-nitride  film  was  formed  on  the  wafer's  surface  during  oxidation  of 
poly  I  gate  structures,  preceding  deposition  of  the  poly  II  film.  This 
si  1 icon-oxy-ni tride  film  did  not  etch  uniformly  in  the  barrel-type  plasma 
etcher  which  normally  is  used  to  etch  and  define  source-drain  regions  for 
input  and  output  FET  circuits.  This  tough  film  was  removed,  however,  by 
a  phosphoric  acid  etch  bath,  which  also  affected  adhesion  of  the  photoresist 
film  protecting  the  surface  of  the  wafer.  An  extended  bake  of  the  photo¬ 
resist  was  tried,  before  the  phosphoric  acid  bath,  in  an  attempt  to  solve 
this  problem. 

Extensive  test  and  evaluation  of  DP3A-9  provided  the  following  informa¬ 
tion: 

1.  Examination  of  the  oxide-nitride-dielectric  sandwich  indicated 
an  oxide  thickness  of  800A;  the  nitride  layer  was  approximately 
600A  thick,  providing  a  dielectric  with  a  total  equivalent  value 
equal  to  1,100A  oxide  (SiO^). 

2.  CV  curves  showed  significant  distortion  from  those  normally  expected, 
which  could  not  be  attributed  to  lot  contamination.  (BT  stress 
tests  verified  the  absence  of  alkali  ion  contamination.) 

3.  Further  testing  of  poly  capacitors  employing  the  Si^N^  -  SiOp 
sandwich,  indicated  extremely  low  leakage  characteristics.  (This 
particular  lot  did  not  receive  extensive  device  or  cell  testing 
as  it  was  fabricated  specifically  to  evaluate  the  basic  char¬ 
acteristics  of  the  dielectric  film  sandwich  as  a  means  of  solving 
the  intergate  "bump"  phenomenon.) 

7.3.2  NE-1  and  NE-2  Test  Patterns 

The  design  of  a  DP5  chip  containing  a  32-bit  adder/subtractor/exclusive- 
OR  array  had  become  so  complex,  that  it  was  decided  to  generate  a  series 
of  N-channel  test  patterns  which  would  prove  out  segments  of  the  DP5 
design.  The  NE-1  mask  set  therefore  contained  twelve  separate  N-channel 
test  structures.  The  second  test  chip,  designated  NE2,  contained  a 
multiplier  cell  capable  of  multiplication  of  two  5-bit  2's  compliment  numbers 
and  deliver  the  product  as  a  9-bit  2's  compliment  number.  It  also  contained 
twelve  cascaded  full  adders  with  their  carry  outputs  taken  to  bonding  pads. 


Three  NE1  lots  were  fabricated  during  February  1978,  using  the  baseline 
N-surface  channel  process  sequence.  In  addition,  an  NE2  lot  was  also  pro¬ 
cessed  using  boron-implanted  25  ohm -cm  P-type  material,  to  reduce  its 
resistivity  to  less  than  10  ohm-cm.  The  resultant  devices  were  to  be 
compared  with  those  on  NE1-1  which  was  fabricated  of  25  ohm-cm  as  well.  The 
difference  in  impurity  concentration  between  the  three  NE1  lots  was 
sufficiently  different  to  determine  the  impact  of  starting  material  resis¬ 
tivity  on  UCCn  device  performance.  Test  results  from  lot  NC 1-1  were 
satisfactory  and  operating  device  characteristics  were  obtained.  Initial 
tests  of  lot  N1  2- I  were  suspended  due  to  excessive  dark  current. 

Fabrication  of  an  N-buried  channel  lot  (NF1-1BC)  was  also  started  during 

this  period.  It  was  produced  with  20  ohm-cm  P-type  material.  The  phosphorus 

12  -2 

(channel)  implant  was  I  x  10  cm  0  150  KEV.  The  boron  channel  stop 
implant  was  1  x  lo'4  c.nf'  0  100  KEV.  High  temperature  annealment  (  1075°C) 
of  implant  damage  was  reduced  to  a  minimum  to  provide  devices  that  would 
withstand  high  ionizing  radiation  doses. 

A  new  surface  channel  lot  designated  NE1-.5  was  started  in  April  1978; 
its  purpose  was  to  produce  half  adders  and  shift  registers  for  radiation  hard¬ 
ness  evaluation.  Several  nominal  process  modifications  were  suggested  as  a 
means  of  achieving  desired  goals,  which  included  resistance-heated  evaporation 
of  tin'  aluminum  metal  1 i zation  film.  Mechanical  abrasion  of  the  backs  of  all 
Nfl-.l  wafers  was  used  to  create  stacking  faults;  these  faults  are  used  to 
tra(i  heavy  metal  ions  that  migrate  through  the  substrate  during  high 
temperature  processing  steps.  In  addition,  heavy  phosphorus  gettering  of 
wafer  backs  gettering)  was  also  employed  during  NMOS  source-drain  doping 
to  reduce  the  number  of  "5"  pits  and  dislocation  sites,  thereby  lowering  the 
dark  current  in  the  completed  devices. 

As  the  Ni  l-.l  wafer  lot  was  to  be  subjected  to  extensive  ionizing  radia¬ 
tion  tests,  tXl)  Processing  personnel  endeavored  to  locate  a  source  of 
resistance-heated  metal  1 ization,  without  success.  This  lot  was  therefore 
metallized  with  I'.u-doped  Aluminum,  using  our  standard  E-beam  deposition 
system.  It  was  felt  at  the  time  that  these  devices  would  have  their 
inherent  radiation  resistance  reduced  by  employing  E-beam  deposited  metal. 
Subsequent  tests  verified  that  these  devices  could  withstand  an  ionizing 
dose  of  104  to  10 '  rads  (Si). 
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Lot  NE 1-4  was  started  in  May  1978,  as  a  means  of  further  evaluating  the 
compound  dielectric  sandwich  of  SiC^  -  Si^N^.  An  error  in  the  processing 
sequence  cause  the  lot  to  be  scrapped;  it  was  replaced  immediately  by  NE1-5. 

NE 1-5  was  produced  as  a  twelve  wafer  lot.  Its  purpose  was  to  determine 
the  effectiveness  of  the  compound  dielectric  sandwich  and  to  determine 
whether  this  type  of  dielectric  would  create  any  unique  device  operating 
characteristics.  This  lot  was  divided  as  follows: 

I.  Gate  dielectrics  consisted  of  600A  of  SiO?,  followed  by  CVD- 
deposited  Si^N^  with  a  thickness  of  2Q OA. 

?.  Gate  dielectrics  consisting  of  600A  of  SiO^,  followed  by  CVD- 
deposited  Si^N^  with  a  thickness  of  400A. 

J.  Two  standard  control  wafers  with  conventional  1 ,000A  of  S i 0^ 
were  also  included. 

Results  from  this  lot  were  inconclusive  due  to  an  equipment  malfunction 
during  the  Si-^N.  deposition  operation. 

7.3.?.  1  S 1 LMAT - Proces sod  Wafers 

Lot  NE1-6S  was  a  ten  wafer  lot  produced  in  May  1978.  Its  purpose  was 
to  evaluate  SILMAT-processed material  to  determine  whether  it  produced  devices 
with  significantly  lower  dark  current  values.  These  wafers  were  obtained 
from  Silicon  Materials  Inc.,  Sunnyvale;  these  specially  processed  wafers 
are  supposed  to  reduce  heavy  metal  contamination  as  well  as  reduce  "S" 
pits  and  other  crystalline  defects  which  are  considered  dark  current 
generation  sites.  Three  control  wafers  were  included  with  this  lot  which 
were  back  abraided  and  heavy  phosphorus  gettered  by  CCD  Processing  Lab  people. 
This  was  done  to  determine  whether  the  S11.MAT  process  provided  any  significant 
improvement  in  device  characteristics,  or  whether  the  same  results  could  be 
obtained  by  in-house  processing.  This  NE1-6S  lot  was  also  processed  with 
the  compound  dielectric  of  S i 0.,/S i  jN^ . 

Initial  test  results  of  the  Nel-6S  lot  indicated  that  the  bump  height  of 
those  wafers  processed  with  a  conventional  1 ,000A  SiO^  gate  oxide  was 
essentially  the  same  as  those  wafers  processed  with  the  ?00A  Si3N^/600A 
SiO.,  gate  dielectric.  The  SIMl  AT-processed  material  produced  devices  that 
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were  similar  in  operating  characteristics  to  devices  produced  with  conven¬ 
tional  in-house  gettering.  As  a  result  it  was  decided  to  continue  to  use 
in-house  gettered  material,  rather  than  go  through  the  expense  and  addi¬ 
tional  handling  required  to  obtain  SILMAT-processed  material.  All  subsequent 
lots  were  mechanically  abraided  and  heavy  phosphorus  doped  to  reduce  sources 
of  dark  current. 

7 .3.2.2  Buried  Channel  Operation 

Lot  NE1-1BC  was  produced  in  March  1978,  however,  meaningful  test  results 
from  this  lot  was  not  obtained  until  June.  (Refer  to  Section  7.3.5  for 
processing  information. )  A  buried  channel  CCD  shift  register  demonstrated 
satisfactory  performance  by  operating  in  a  frequency  range  of  60  kHz  to 
3  MHz.  Transfer  efficiency  for  these  devices  was  between  0.995  and  0.998 
and  compared  faborably  to  devices  operating  in  a  surface  channel  mode.  The 
speed  of  the  buried  channel  device  was  limited  on  the  low  side  by  dark 
current  and  on  the  high  side  by  CCD  gate  lengths  (1.5  mil)  and  mutual  induc¬ 
tion  transients  in  the  clock  phases. 

The  source  of  the  excessive  dark  current  that  previously  plagued  these 
devices  was  identified  as  coming  from  the  guard  ring  buried  channel  implants, 
as  regulated  by  the  phased  gates.  Operating  these  gates  at  sufficiently 
large  voltage  levels  permitted  excess  charge  injection  into  the  channel. 

This  charge  came  from  the  generation  centers  in  the  depletion  regions. 

In  August  1978,  the  decision  was  made  to  discontinue  making  NE1  lots. 

The  desire  to  complete  the  DP5  mask  set  (32-bit  adder/subtractor/exclusive- 
0R  array)  also  necessitated  suspension  of  any  work  on  the  NE2  design 
(3x5  array) . 

The  first  DP5-1  lot  was  completed  in  October  1978.  During  the  deposi¬ 
tion  of  poly  I,  an  interruption  of  the  process  occurred  due  to  a  power 
outtage  causing  a  discontinuity  in  the  CVD  deposition  cycle.  Though  the 
correct  amount  of  poly  film  was  deposited  after  the  system  was  put  back 
into  operation,  the  discontinuity  somehow  extended  the  etch  rate  so  that 
fine  filaments  of  polysilicon  remained  on  the  field  oxide.  These  filaments  were 
too  fine  to  notice  during  periodic  inspection  following  plasma  etching  of 
the  poly  layer.  They  were  discovered,  however,  when  high  resistance  shorts 
were  found  between  adjacent  poly  lines.  The  DP5-1  lot  was  therefore 
rejected  as  untestable. 


Due  to  the  large  size  of  the  DP5  chip  (approximately  0.4  inch  on  a 
side)  the  alignment  targets  across  the  wafer  were  spaced  too  far  apart  for 
normal  alignment.  This  problem  was  corrected  by  changing  the  location 
of  the  alignment  target  near  the  edge  of  the  chip,  to  a  location  near 
the  chip’s  center.  The  redesigned  mask  set,  incorporating  this  modifica¬ 
tion,  was  designated  DP5A. 

It  was  also  determined  that  the  scri be  1 ine  areas  on  two  layers  were 
omitted,  adding  to  the  alignment  problem.  This  error  was  also  corrected 
on  the  DP5A  mask  revision. 

While  awaiting  the  DP5A  revised  mask  set,  some  tests  were  performed  on 
the  half  adder  of  DP5-1  and  some  useful  results  were  obtained;  however,  the 
large  number  of  poly  I  layer  shorts  made  it  impossible  to  accurately 
control  test  conditions. 

In  January  1979,  lot  DP5A-1  was  completed  and  delivered  to  test.  Several 
mask  errors,  such  as  omitted  contacts  to  polysi 1  icon  gates ,  were  found  by 
careful  inspection  of  the  mask  set  during  processing.  All  mask  errors 
appeared  to  be  in  the  32-bit  arithmetic  array;  no  errors  were  found  in  the 
test  devices.  The  decision  was  made  at  that  point  not  to  start  any  new  lots 
until  another  complete  inspection  of  all  test  devices  and  the  arithmetic 
array  could  be  performed  and  appropriate  corrections  made. 

In  essence,  this  completed  the  work  accomplished  on  the  Navy's  DCCD 
development  program.  No  new  DP5A  lots  were  fabricated  as  the  ultimate 
complexity  of  this  large  32-bit  array  would  seem  to  defeat  the  ability  of 
the  design  group  to  uncover  all  potential  errors  without  a  commitment  of 
manpower  and  time  beyond  the  remaining  resources  of  the  program. 

Despite  the  difficulties  encountered  on  this  program,  a  viable,  repeatable 
process  was  generated  that  has  since  been  employed  to  fabricate  a  number  of 
complex  DCCD  LSIs.  Two  significant  programs,  such  as  the  AZIMITH  CORRELATOR 
DEVICE  (ACD)  and  the  FAST  HADAMARD  TRANSFORM  ( FHT )  chips  have  employed  the 
identical  processing  sequence  generated  during  the  final  phase  of  the  Navy 
Program.  A  detailed  listing  of  this  DCCD  processing  sequence  is  provided 
in  Table  7.1. 
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TABLE  7.1.  THE  DCCD  DOUBLE  POLYSILICON  PROCESS  FLOW  CHART 


_ OPERATION _ 

Select  Wafer  Material;  Record  Thickness 
and  Resistivity 

Clean  Wafers 

Field  Oxidation 

Apply  Photoresist  for  Channel  Stop 
Cut 

Etch  Channel  Stop  Cut 

Sfrip  &  Clean  Photoresist 

Oxidation  and  N2  Anneal  of  Channel 
Stop  Cut 

Implant  Channel  Stop 

Clean  Surface  of  Oxide  followed  by 
Anneal  of  Implant  Damage 

Apply  Photoresist  for  Channel  Cut 

Etch  Channel  Oxide 

Strip  &  Clean  Photoresist 

Buffered  Dip  and  Clean 

Oxidation  and  N2  Anneal  of  Channel  Cut 

Polysilicon  I  Film  Deposition 


16.  N+  Diffusion 

17.  Photoresist  Preparation 


Apply  Photoresist  to  Poly  I  Film 
Buffered  Dip  and  Plasma  Etch 


Plasma  Etch  Wafer  Backs 


Strip  &  Clean  Photoresist 
Etch  Channel  Oxide  and  Clean 


_ REMARKS _ 

10  ohm-cm  P-ype  Mtl .  <100> 


Standard  5  min.  Buf  Dip 

F.O.  Thickness  =  8.000A  @  1,075°C 


Positive  Resist  AZ111 


Oxide  Thickness  =  200A  @  920°C 


Boron  (Bn)  2xlOH  @  100  KeV 
15  min.  0  1,075°C  in  N, 


Resist  AZ 111 


Insures  Clean  Silicon  Surface  Before 

Gate  Oxide  Growth  Cycle 

Oxide  thickness  =  1.000A  @  920°C 

CVD  Deposition;  Poly  film  thickness 
=  4 , 000A 

N+  Furnace  @  950°C 

Clean  Poly  Surface,  Followed  by 
Oxidation  of  Poly  Surf  @  900°C 
(Permits  better  adhesion  of  Photoresist) 
HMDS  and  Photoresist  1350J 

Provides  Poly  I  Pattern 


This  Etch  Prepares  Silicon  Surface  for 
Poly  II  Gate  Oxide  Growth  Cycle 


OPERATION 


REMARKS 


Oxidation  and  N2  Anneal  for 
Poly  II  Gate  Oxide 

Polysilicon  II  Film  Deposition 

N+  Diffusion 
Buffered  Dip 
Photoresist  Preparation 

Apply  Photoresist  to  Poly  II  Film 

Buffered  Dip  and  Plasma  Etch 

Plasma  Etch  Wafer  Backs 

Strip  &  Clean  Photoresist 

Photoresist  Preparation 

Apply  Photoresist  For  NMOS 
Source-Drains 

Etch  Oxide  in  Source-Drain  Regions 

Strip  &  Clean  Photoresist 
+ 

N  Diffusion  For  Source-Drains 

Buffered  Dip  followed  by  Resistivity 
Check 

TEOS  Deposition  and  Density 

N+  Getter  and  Photoresist  Prep. 

Apply  Photoresist  For  Contact  Holes 
Etch  Contact  Holes  (Vias) 

Strip  &  Clean  Photoresist 
Deposit  Metal  (Aluminum  +  4%  Cu) 

Apply  Photoresist  to  Metal  Film 
Etch  Metal  Interconnection  Pattern 
Strip  &  Clean  Photoresist 
Sinter  Metal 


Oxide  Thickness  =  1.000A  @  920°C 

CVD  Deposition;  Poly  Film 

Thickness  =  3500A  ; 

N+  Furnace  @  950°C 

* 

To  Remove  Phosphorus  From  Surface 

Oxidation  of  Poly  Surface  at  900°C 
(Permits  better  adhesion  of  Photoresist*, 

HMDS  and  Photoresist  1350J 

Defines  Poly  II  Pattern 

10  min.  @  900°C  in 
Photoresist  AZ111 


N+  Furnace  P  950°C 

Insures  sufficient  doping  of  S-D's 

TEOS  Thickness  =  5000A;  TEOS  deposited 
at  730°C,  with  densification  at  920°C 

N+  Furnaced  Used 

Photoresist  AZ111 

Buffered  HF  Used  to  Clear  Holes 

Metal  Film  Thickness  =  10.000A  Front; 
5,000A  wafer  backs 

HMDS  and  Photoresist  1350J 

) 

Approx.  1  hr.  @  400°C 


1 


STEP 

OPERATION 

REMARKS 

48. 

SILOX  Deposition 

SILOX  Thickness  =  4000A  with  3%  Phos 

49. 

Apply  Photoresist  to  SILOX  Film 

HMDS  and  Photoresist  1350J 

50. 

Etch  SILOX  Passivation  Pattern 

51. 

Strip  &  Clean  Photoresist 
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7.4  THE  IMPACT  OF  THE  DCCD  PROGRAM  ON  TRW'S  PROCESSING  TECHNOLOGY 


The  difficulties  encountered  in  both  the  design  and  fabrication  of 
DCCD  LSI's  emphasized  the  necessity  of  tight  process  standardization  which 
could  only  be  brought  about  by  extensive  process  automation.  As  the  chief 
advantages  of  DCCD's  include  very  high  cell  density  coupled  with  MHz  clock 
rates,  the  necessity  of  very  tight  device  geometries  and  very  accurate  process¬ 
ing  is  apparent.  As  a  result  of  the  many  problems  requiring  solution  in 
order  to  standardize  DCCD  fabrication,  TRW  had  decided  to  embark  on  a  relatively 
extensive  automation  program,  in  an  attempt  to  provide  DCCD  device  yields 
comparative  to  those  obtained  by  other  state-of-the-art  MOS  technologies. 

The  automation  program  will  include  all  major  processing  technologies: 

7.4.1  Mask-to-Wafer  Alignment 

This  effort  involves  the  purchase  of  proximity  alignment  equipment  capable 
of  one  micron  line  resolution.  It  will  provide  cassette- to-cassette  wafer 
transfer,  deep  UV  light  sources  (235  n  meters)  and  split  field  imaging. 

Projection  alignment  equipment  significantly  increases  mask  life  and  will  be 
considered  during  the  next  major  upgrade  of  the  MOS  Processing  Facility. 

TRW  has  also  purchased  an  E-beam  direct  writing  system  that  can  be  used 
for  patterning  micron-sized  gate,  conductor  and  interconnection  patterns, 
channel  stop,  and  P+  guard  ring  patterns  directly  onto  the  photoresist-coated 
wafers. 

Direct-Step-On-Wafer  equipment  is  also  being  considered  for  the  MOS 
Fabrication  Facility  as  a  means  of  improving  line  tolerances,  reducing 
geometries,  and  significantly  increasing  LSI  density. 

7.4.2  Photoresist  Coating,  Developing  and  Removal 

Current  line  widths  are  now  at  the  3-4  micron  level.  The  densities 
demanded  of  custom  LSI  circuits  during  the  early  1980's  will  necessitate 
a  photolithographic  technology  capable  of  delivering  1  -  2  micron  lines  in 
a  day-to-day  production  environment.  TRW  has  already  purchased  the  most 
advanced  wafer  handling  equipment  available  (GCA  Wafertrac  Equipment),  which 
will  soon  be  installed  in  the  MOS  Processing  Facility.  This  equipment  is 
capable  of  automatic  scrub/bake,  coat/bake,  develop/bake  operations;  all 
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wafers  are  transported  to  the  various  processing  stations  by  means  of  air 
bearing  tracks.  Wafers  are  inserted  and  removed  from  the  processing 
sequences  by  means  of  cassette-to-cassette  wafer  handling  equipment.  The 
entire  system  is  microprocessor  programmed  and  any  subsystem  malfunction 
is  brought  to  the  attention  of  operators  by  automatic  alarms.  Removal  of 
operator  manipulation  from  this  key  processing  sequence  insures  uniform 
processing  of  wafers  and  therefore  higher  yields. 

Within  the  next  few  years,  Wafertrac  equipment  will  provide  automatically 
scrubbed,  baked,  photoresist-coated  and  baked  wafers  directly  to  the 
projection  alignment  station,  where  cassettes  will  feed  uniformly  processed 
wafers  directly  to  a  microprocessor-controlled  aligner,  that  provides  0.2 
micron  alignment  accuracy,  based  upon  automatic  pattern  recognition  system 
capabilities. 

7.4.3  Microprocessor-Controlled  Diffusion  Furnaces 

As  MOS  device  geometries  are  reduced  to  dimensions  that  provide  short- 
channel  operation,  Xj  distances  related  to  source-drain  implants  will  be 
reduced  to  0.2  -  0.3  microns.  The  control  of  implant  drive-in  temperatures 

and  time  must  necessarily  be  quite  precise  to  permit  the  fabrication  of 
devices  that  operate  with  the  desired  threshold,  punch-through  and  break¬ 
down  voltages.  Ultimately,  tight  control  over  furnace  zone  temperatures 
and  drive-in  periods  can  be  obtained  by  means  of  microprocessor  controllers; 
these  devices  continuously  monitor  all  three  heat  zones  of  each  furnace  and 
provide  an  alarm  should  one  of  these  zones  change  temperature  by  more  than 
Is-degree  Centigrade. 

Other  important  aspects  of  microprocessor  controlled  furnaces  include 
low  temperature  growth  (900°C)  of  gate  and  intermediate  field  oxides,  with 
subsequent  ramping  of  these  tubes  to  a  higher  temperature  (1,000°C)  for 
proper  annealment.  The  ability  to  ramp  diffusion  furnaces  up  and  down 
without  removing  wafers  from  the  tube  reduces  the  number  of  tubes  required 
by  a  processing  facility.  Also,  it  reduces  handling  of  the  wafer  boats,  as 
a  wafer  lot  can  go  through  a  doping  and  diffusion  cycle  at  different  tempera¬ 
tures,  without  being  removed  from  the  tube  in  which  the  initial  doping  is 
accompl ished. 
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7.4.4  Dry  Etching  of  Polysilicon,  Nitride,  Oxide  and  Metal  Patterns 

The  MOS  Processing  Laboratory  currently  uses  both  wet  and  dry  etching 
of  LSI  films,  although  the  direction  of  the  etching  technology  shows  a 
clear  preference  for  dry  (plasma)  etching  over  wet  chemistry  processes. 
Processing  of  MOS  and  DCCD  devices  indicate  the  need  for  greater  precision 
in  etching  critical  self-aligned  gate  structures.  This  implies  little  or 
no  under  cut  of  gate  patterns  and  metallization  patterns  where  line  widths 
are  less  than  2  microns. 

Current  laboratory  capabilities  include  a  barrel-type  plasma  etcher, 
that  provides  less  than  a  1  :  1  undercut.  As  gate  and  interconnection  line 
widths  decrease  to  less  than  2  microns,  there  will  be  a  general  demand  for 
planar  plasma  etching  equipment  capable  of  providing  true  anisotropic 
etching  capabilities.  TRW  has  recently  purchased  planar  etching  equipment 
and  will  soon  employ  it  as  a  means  of  defining  poly-gate  levels  of  MOS  and 
DCCD  LSI's. 

7.5  DCCD  TECHNOLOGY  SUMMARY 

DCCD  fabrication  processes  for  both  the  DP  and  NE  series  of  devices 
evolved  though  many  mask  layout  iterations.  These  changes  were  needed  to 
improve  circuit  operating  parameters,  such  as  dynamic  range  and  transfer 
efficiency.  Each  new  generation  of  circuits  which  employed  higher  density 
geometries,  also  necessitated  significant  changes  in  the  DCCD  processing 
sequence,  or  a  different  arrangement  of  processing  steps.  Most  of  these 
technology  variations  involved  poly  I  and  II  gate  layout  patterns.  Table 
7.2  lists  the  various  DP  and  NE  generations,  including  significant  differences 
between  generations. 
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Table  7.2  DP  and  NE  Design  Generations. 


Circuit  Designation  Mask  Type  Gate  Technology  Remarks 


N1  through  N4  Conventional  Poly  &  Metal  *Field  Oxide  -  15.000A 

*P-Channel  Technology  „ 

♦Gate  Oxides:  1000/2000  A 

NAV5  through  NAV11  Conventional  Poly  &  Metal  ^Technology  concentrated 

on  Floating  Gate  Devel. 

DP-0  Slip  Masks  Poly  &  Metal  *Test  circuits  designed 

to  verify  Floating  Gate 
FET  discharge  concept; 

*6  mask  levels  required 
♦Poly  Protect  film  used 

DP-1  Slip  Masks  Poly  with  Poly  *TE0S  used  as  a  protective 

Protect  Mask/Metal  film  over  poly,  to  help 

metal  step  coveraqe  «, 

*  Gate  Oxides:  1000/3000  A 

♦Contained  8  +  8  Adder- 
contained  3x3  Multiplier 
♦First  use  of  Double  Poly  Cates 
♦Arsenic  doping  of  Poly  Films 
♦Switch  to  N-Channel  on 
lot  DP-2A-1N 

DP-3,  ( -3A)  Conventional  Poly/Poly  Contained  8x8  Multiplier 

(-3A,  mask  Contained  16  +  16  Adder 

revision)  *First  lot  of  Buried  Channel 

♦Thick  Field  Oxide  &  Channel 
Stops  used  together  to  define 
CCD  Channel 

*  Field  Oxide  =  10.000A  for 
improved  poly  A  metal  line 
defini tion 

*10  ohm-cm  P-type  material 
♦Intergate  Barrier  or  "Bump" 
identified  and  SEM  photos 
♦First  attempt  to  deposit 
S i O2/S i 3N4  Dielectric 


DP-2,  ( -2A)  Slip  Masks(-2)  Poly/Poly 

also  Convent¬ 
ional  Masks (-2A) 
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Circuit  Designation  Mask  Type  Gate  Technology 


Remarks 


NE-1 


NE-2 


DP-5 


Conventional  Poly/Foly 


Conventional  Poly/Poly 


Conventional  Poly/Poly 


*Contained  12  N-Channel  Test 
Structures 

*25  ohm-cm  material  used 
*NE1-1BC  first  N-Buried 
Channel  Lot 

♦Mechanical  Abrasion  of  Wafer 
Backs  used  for  1st  time  in 
combination  with  heavy  N+ 
Gettering  to  lower  dark 
current. 

♦NE-1-6S  used  to  evaluate 
SILMAT-processed  material 

♦Contained  12  cascaded  full 
adders;  also  multiplier  cell 
♦one  lot  only  fabricated 

♦Contained  32-bit  Adder/Sub- 
tractor/Excl usi ve-OR 
♦Chip  Size  =  0.4"  x  0.4" 
♦Circuit  used  full  Adder 
design. 


7.5.1  Mask  Generations 

Although  conventional  mask  sets  ..ore  selected  for  the  first  mask 
generations  of  Lnis  pt cgram  (N1  -  N4;  NAV5  -  NAVI  1 ) ,  a  commi tment  to  slip 
mask  use  was  made  during  fabrication  of  DP-0,  -1  and  -2  series.  Slip 
masks  were  selected  for  the  DP  series  primarily  to  save  program  funds. 

The  very  complexity  of  the  DCCD  technology  required  many  mask  level  changes 
to  correct  layout  errors,  occasional  errors  created  by  the  computer  graphics 
system  (Applicon  System),  and  circuit  modifications  based  upon  test  circuit 
performance. 


7-26 


•  iK-  •>#  .  .k,vr  » 


Slip  masks  provided  three  or  four  mask  levels  per  plate;  during  the 
alignment  of  successive  mask  levels,  the  slip  mask  was  displaced  in  either 
the  X  or  Y  direction.  Combining  four  mask  levels  onto  one  plate  reduced 
mask  fabrication  costs  and  facilitated  quick  lot  turn-around.  The  major 
problem  with  slip  masks  involved  alignment  difficulties  created  by 
extensive  mask  "run-out"  and  "rotation"  between  successive  mask  levels. 

Normal  lens  aberrations  did  not  permit  the  mask  manufacturer  (Electromask, 
inc.)  to  compensate  for  dimensional  tolerance  variations  between  mask  levels. 

While  slip  masks  were  used  for  DP-0,  DP-1  and  DP-2  mask  generations, 
alignment  difficulties,  combined  with  low  yields,  necessitated  the  abandon¬ 
ment  of  this  approach  in  favor  of  conventional  mask  sets,  which  provided 
superior  results.  Conventional  mask  sets  were  therefore  used  for  the  later 
versions  of  DP-2A  and  DP-3  and  -3A  series.  Further  improvements  in  circuit 
yields  could  have  been  obtained  if  proximity  or  projection  alignment  equip¬ 
ment  had  been  available  to  produce  these  high  density  circuits. 

7.5.2  Gate  Technology 

Two  significantly  different  approaches  were  evaluated  during  the  early 
phases  of  this  program  to  determine  the  most  reliable  and  reproducible 
method  of  fabricating  CCD  gate  structures.  Aluminum  metal  gate  structures 
were  initially  selected  for  Poly  II  levels,  as  it  basically  requires  a  simpler 
process  than  polysilicon  gate  structures.  Aluminum  gates  were  used  on  DCCD 
series  N1  through  N4;  NAV5  through  NAV 11;  DP-0;  also,  DP-1.  None  of  the 
processes  used  to  fabricate  these  various  DCCD  generations  employed  an 
isoplanar  technology;  the  resulting  topography  of  these  various  designs 
were  quite  complex  and  metal  step  coverage  became  a  significant  factor  in 
obtaining  workable  devices.  The  DCCD  processing  technology  was  extensively 
reworked  for  DP-2,  to  permit  the  use  of  a  dual  polysilicon  technology.  The 
use  of  poly  for  both  gate  levels  resulted  in  a  significant  improvement  in 
step  coverage  as  well  as  device  yields. 

Initially,  the  polysilicon  films  were  deposited  in  an  RF-heated  reactor, 
which  permitted  in-situ  doping  and  annealment  of  these  films  with  limited 
success.  The  problems  encountered  with  RF  reactor  deposition  centered  around 
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film  thickness  uniformity,  it  became  a  significant  problem  to  control 
polyfilm  thicknesses  to  better  than  20%.  In  addition,  the  deposition 
reaction  required  the  use  of  hydrogen  as  a  reducing  agent;  this  created 
the  potential  for  explosions,  forcing  the  decision  to  build  a  hot  wall 
CVD  reactor. 

In  1976,  MEC  personnel  completed  the  design  and  fabrication  of  a 
CVD  reactor  for  polysilicon  deposition.  This  tube  has  been  used  success¬ 
fully  during  cne  latter  half  of  this  DCCD  program.  In  general,  the  CVD 
polysilicon  deposition  system  produced  very  uniform  films  of  required 
thickness  (n.  4,000A),  superior  in  uniformity  to  that  obtained  from  the 
RF-heated  system.  Following  a  one  hour  pump-down,  the  polysilicon  film 
is  deposited  at  approximately  630°C;  deposition  of  the  4,000A  film  required 
20  minutes.  The  only  significant  drawback  of  this  system  is  the  necessity 
to  dope  the  polysilicon  film  in  an  N+  or  P+  furnace,  to  insure  polysilicon 
gate  and  interconnection  patterns  of  sufficiently  low  resistivity. 

Separate  doping  lengthens  and  complicates  the  processing  sequence.  In-situ 
doping  of  polysilicon  films  will  be  the  subject  of  an  intensive  development 
effort  in  the  near  future,  to  reduce  operator  handling  and  increase 
processing  yields.  Its  only  limitation  appears  to  be  the  necessity  to 
deposit,  then  dope  the  poly  film  in  a  conventional  manner,  which  extends 
processing  length  and  complexity.  Some  companies  now  producing  LSI  devices 
employ  in-situ  doping  of  these  films,  however,  the  details  of  this  technology 
innovation  is  considered  "company  proprietary"  information.  Doping  and 
diffusion  systems  manufacturers  have  not  yet  solved  this  problem,  as 
systems  incorporating  this  important  feature  are  not  yet  available. 

7.5.2. 1  Gate  and  Field  Oxide  Undercut  Problems 

Significant  gate  oxide  undercut  was  encountered  during  fabrication  of 
initial  DP  series  lots;  this  became  a  significant  yield  loss  problem 
following  adoption  of  the  double  polysilicon  gate  process  (DP-2  Series). 

Several  process  changes  were  made  to  avoid  this  problem.  A  special 
photoresist  masking  step  was  used  to  cover  polysilicon  gates  and  interconnects 
before  oxide  cuts  were  made.  This  special  "poly  protect"  masking  step 
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success fully  prevented  oxide  undercut,  as  shown  in  Hyure  /  Note  th<it 
severe  oxide  undercut  os  shown  in  i  igure  would  result  in  broken  met.illi.M- 
tion  lines  caused  t'v  poor  step  coverage.  Ihe  "polv  protect"  processing 
sequence  did  inucii  to  eliminate  yield  losses  roust'd  by  tiiis  phenomenon. 


\ 


\ 


V; 


Injure  ’  I’oKsilnon  Protect  Pont  itjurot  ion. 


o.  Polysilicon  pattern  defined. 

b.  Oxide  etched  without  "polv  proteit"  masking 
step,  showing  undercut  t'f  polvsilicon  note 
interconnect  ion  pat  fern. 

c.  Oxide  etched  with  "poly  protect"  shown imj  no 
undt'i'cut  t'f  polvsilicon  region,  permitting 
smooth  transit  it'n  for  suhsotiuent 

me  to  1 1  i  .*ot  ion  coverage. 


It  is  unfortunate  that  wet  etchin.)  prot  esses  must  still  he  list'd  to  define 
fit'll!  and  note  oxide  aits;  oxide  undent  ut  con  be  snimf  nonth  reduced  by 
plasma  etihing  techni.|ues,  particularly  planar  plasma  t'tchers  that  provide 
an  anisotropn  etch  .apabilitv  Reactive  ion  etching  is  still  a  relatively 
it t'w  (ethnology  innovation  and  etchimj  oxides  hv  this  method  is  a  very  slow 
proioss.  Although  ion  milium  was  available  as  a  means  of  detinimj  field 
and  oatt'  oxide  iiits,  it  w.is  not  ust'.l  duriiuj  the  course  of  this  program; 
this  etchtmi  teihnniue  is  not  selective  in  its  removal  of  f  i  1m  material 
and  would  have  net  t'ss i tated  an  extensive  t^per iment a  1  program  to  insure 
good  device  yields. 
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7 . S . 2.  i?  Gate  Control  by  Means  of  Gate  0  x  i de  Jhi ckn  ess^  Adjustment 

An  extensive  number  of  processing  variations  were  made  and  evaluated 
to  improve  gate  control  of  charge  packets.  Poly  1  and  II  gate  oxide  thick¬ 
nesses  were  varied  from  1 000/ COOOA  to  1000/3000A;  later  variations  included 
SOO/C400A.  Goth  boron  and  phosphorus-doped  polv  11  layers  were  made,  in  an 
attempt  to  simultaneously  dope  source-drain  and  self-aligned  polygate 
structures.  Hoping  polygate  structures  with  boron  provided  poor  results,  as 
source-drain  drive-in  time  and  temperature  requirements  usually  resulted  in 
boron  penetration  of  the  relatively  thin  gate  oxides  with  resultant  doping 
of  the  channel .  In  most  instances,  this  changed  the  threshold  voltage  charac¬ 
teristics  of  the  devices  sufficiently  to  make  the  circuit  untestable. 

A  review  of  Section  Processing  History,  indicates  that  a  HP-1  water- 

lot  was  produced  by  doping  source-drain  regions  and  poly  II  gate  structures 
simultaneously.  This  was  an  attempt  to  simplify  the  DC CP  process  by 
eliminating  the  nitride  layer-  which  was  used  at  that  point  in  process 
development.  As  a  poly  11  protect,  mask  against  boron  doping  of  the  already 
Arsenic-doped  poly  film,  the  nitride  film  also  prevented  boron  di f fus ion  and  doping 
of  the  gate  oxide,  with  resultant  changes  in  channel  threshold  characteristics. 
Mimination  of  the  nitride  protect  film  on  this  DP-1  lot  produced  very  low 
threshold  voltages,  as  well  as  some  depletion-type  devices.  As  previously 
stated,  boron  penetration  through  the  gate  oxide  was  suspected.  In  the  worst 
case,  boron  penetration  will  form  a  heavily  doped  P-type  region  immediately 
beneath  the  gate  oxide,  resulting  in  depletion  type  devices.  Subseguent 
tests  of  DP-1  devices  verified  this  occurrence.  A  common  technique  used  in 
the  study  of  impurity  penetration  through  gate  oxide  involves  measurement 
of  the  depth  of  the  .junction  formed  by  the  impurity's  penetration.  A  deter¬ 
mination  of  the  time  necessary  for  penetration  of  the  SiO.,  gate  oxide  layer, 
to  a  particular  depth  in  the  silicon  can  be  made. 

It  is  obvious  that  significant  electrical  effects  due  to  boron  penetration 
will  occur  in  CCD  devices  long  before  a  PN  junction  is  detected.  At  the  onset 
of  boron  penetration,  ionised  boron  atoms  will  cause  the  formation  of  a  positive 
charge  layer  at  the  Si /SiO-,  interface,  shifting  the  flatband  voltage  of  the 
MOS  device.  Movement  of  the  flatband  voltage  can  be  detected  as  an  equivalent 
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change  in  the  MOS  device's  threshold  voltage.  C-V  measurements  carried  out 
on  test  capacitors,  clearly  showed  boron  penetration.  In  addition,  boron 
penetration  was  not  uniform  and  varied  across  the  wafer,  as  determined  by 
variations  in  test  transistor  threshold  voltage  measurements. 

As  a  result  of  depletion-type  device  formation,  boron  doping  of 
polygate  structures  associated  with  PMOS  devices  was  abandoned  in  favor  of 
in-situ  doping  with  arsenic  during  RF  deposition  of  the  polysilicon  film. 

DCCD  technology  advancements  will  require  thin  polygate  structures  and 
interconnections  2,000A).  As  these  polygate  film  thicknesses  decrease, 
it  will  complicate  doping  these  films  to  achieve  films  of  high  conductivity. 

One  approach  toward  solving  this  problem  involves  modifying  CVD  deposition 
to  enable  in-situ  doping  of  the  polysilicon;  a  second,  less  conventional, 
approach  is  refractory  metal  deposition  and  drive-in,  providing  a  metal- 
doped  polysilicon  film  of  exceedingly  low  resistivity.  Both  of  these 
technology  approaches  will  be  attempted  within  the  near  future. 

7 . 5 . 2 . 3  lhcnnany  Grown  Field  and  Gate  Oxides 

The  DCCD  fabrication  process  employs  relatively  low  temperatures  (920°C) 
gate  oxides;  field  oxides  are  grown  at  1075°C.  In  general,  wet  oxidation 
cycles  are  used,  as  dry  oxidation  cycles  take  approximately  X5  the  growth 
period  needed  for  a  wet  cycle.  Wet  oxides  are  grown  by  steam  oxidation  of 
the  silicon  substrate;  the  steam  is  produced  by  an  in-situ  reaction  of  and 

CL  gasses  supplied  to  an  oxy-hydrogen  torch  that  provides  the  super-heated 

c  11  -2  .  • 
steam.  Measured  Q  values  were  approximately  2  x  10  cm  ;  in  this  instance, 

Q$s  represents  fixed  surface  states  or  interface  charges  that  occur  during 

the  thermal  oxidation  cycle.  N  ,  or  fast  surface  state  density,  values  were 
9-2  ss 

usually  in  the  10  /cm  ev  range.  As  great  care  is  taken  to  prevent  furnace 
contamination  by  mobile  (alkali)  ions,  HC1  steaming  of  furnaces  is  done  on  a 
periodic  basis,  B-T  stress  testing  of  furnaces  in  the  MOS/CCD  Processing  Lab  usually 
indicates  mobile  ion  concentrations  that  are  less  than  the  sensitivity  of  the  C-V 
measurement  equipment. 

Gate  or  field  oxides  prepared  by  wet  thermal  oxidation  of  the  silicon 
substrate,  generally  have  positive  charges  associated  with  the  oxide  layer; 
as  a  result,  the  underlying  silicon  will  be  depleted  or  inverted,  if  it  is 
P-type  ,  or  will  evidence  accumulation  if  it  is  N-type.  These  charge  states 
may  be  classified  in  at  least  four  general  categories. 
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The  nature  of  these  oxide  charge  states  in  relation  to  the  SiC^/Si 
interface  is  indicated  in  Figure  7-3.  These  include: 

1.  Q$s  -  fixed  surface  states  or  interface  charges 

2.  Q  -  mobile  charges  within  the  oxide  (caused  by  alkali  ion 
contamination) 

3.  N$*  -  surface  recombination  centers,  which  act  as  generation 
sites  of  unwanted  dark  current  leakage 

4.  Nt  -  trapping  sites  within  the  oxide,  that  are  susceptible  to 
ionizing  radiation;  this  affects  both  threshold  voltage  and 
transfer  efficiency  of  exposed  devices. 

Q  or  fixed  charge  is  apparently  quite  close  to  the  Si09/Si  interface; 
its  density  can  vary  from  almost  zero  to  approximately  2  x  10  electronic 
charges/cm  .  Mobile  ion  charges  are  usually  the  result  of  processing 
contamination  that  can  be  introduced  by  operator  oversights,  contaminated 
solvents,  photoresist  developer  solutions,  or  diffusion  furnace  tube 

devitrification.  N  is  a  measure  of  fast  surface  states;  the  density  of 

SS  10  2 
such  active  surface  states  can  range  from  less  than  10  /cm-ev  to  significantly 

higher  values.  It  is  believed  that  the  presence  of  these  fast  surface  states 

depends  upon  processing  conditions,  while  the  silicon  potential  determines 

whether  or  not  they  are  charged. 

Positively  charged  traps  in  the  oxide,  referred  to  as  N^,  have  been  observed 

after  exposure  of  a  MOS  device  to  X-rays,  electron  streams,  or  other  ionizing 
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radiation.  The  concentration  of  these  traps  is  of  the  order  of  10  cm”  . 

Surface  charge  has  also  been  identified  as  a  factor  in  grossly  modifying 
MOS  device  performance;  it  occurs  on  the  outer  surface  of  an  oxide,  and  can 
usually  be  attributed  to  surface  contamination.  Usually  this  charge  is  a 
result  of  ion  migration  in  the  vicinity  of  a  biased  junction.  As  stated,  surface 
charge  migration  requires  a  conduction  surface,  which  is  usually  brought 
about  through  surface  contamination. 

Gate  and  Field  oxidations  are  about  to  undergo  a  relatively  dramatic 
change  in  processing  methods,  brought  about  by  pressurized  oxidation  systems 
that  operate  at  temperatures  well  below  900°C.  These  furnaces  will  be  micro¬ 
processor  controlled  with  real-time  monitoring  of  each  temperature  zone,  to 
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insure  uniform,  repeatable  results.  It  is  not  inconceivable  that  both  gate 
and  field  oxides  will  be  grown  in  the  same  furnace  tubes,  with  micro¬ 
processor  controlled  adjustments  made  on  the  basis  of  preprogrammed 
oxidation  and  annealment  cycles. 

7 . 5 . 2 . 4  Clean  Gate  Oxide  Technology 

The  MOS/CCD  Processing  Laboratory  has  investigated  and  developed  a 
clean  gate  oxide  technology  for  CCD  fabrication.  Early  experiments  using 
conventional  oxidation  systems  showed  that  reproducible  fixed  charge  values 
could  be  obtained  only  with  new  quartz  tubes;  however,  these  values  deteriorated 
rapidly  with  time.  Reproducible  fixed  charge  values  (Q  )  were  found  when 
double  wall  quartz  tubes  were  used,  however  even  these  relatively  stable 
values  drifted  over  an  extended  period  of  time.  Changes  in  Q$s  were  also 
attributed  to  substrate  crystal  orientation,  with  greater  changes  in  Qs$ 
occurring  in  <  11 1 >  than  with  <100>  substrates.  A  more  rapid  increase  in 
Qss  was  also  observed  in  oxides  grown  in  Spectrosil  furnace  tubes  than  in 
GE  #204  quartz  tubes;  it  was  claimed  that  Spectrosil  quartz  is  practically 
free  from  metallic  impurities,  with  total  impurity  concentrations  equal  to 
one  part  per  million.  A  spectroscopic  analysis  of  the  quartz  used  in  these 
tube  indicated  a  sodium  content  of  approximately  20  ppm,  which  appeared  as 
its  major  impurity.  An  analysis  of  GE  #204  tube  quartz  indicated  an  aluminum 
content  of  40  to  50  ppm;  also,  sodium  (20  to  30  ppm)  content  as  well. 

Experiments  directed  toward  elimination  of  high  Qss  produced  some  interesting 
results;  it  appears  that  very  fast  quenching  during  removal  of  wafers  from  the 
furnace  hot  zone  can  reduce  fixed  charge  (Qss)  to  a  minimum.  When  oxides 
with  very  small  fixed  charge  values  (1  x  1011  cm-2)  were  quenched  in 
to  -200°C,  or  annealed  in  gaseous  ^  at  a  relatively  low  furnace  temperature 
(550°C),  fixed  charge  could  be  virtually  eliminated  and  experimental  curves 
close  to  the  theoretical  values  can  be  obtained.  It  is  suspected  that  aluminum 
and  sodium  atoms  that  are  found  in  thermally  grown  oxide,  originate  in  the 
quartz  furnace  tubes  and  are  the  source  of  the  observed  increase  in  Q$s 
values.  The  changes  in  fixed  charge  caused  by  rapid  cooling  or  quenching 
may  be  the  result  of  a  shift  in  the  state  of  the  aluminum  atoms,  from  an 
interstitial  to  a  substitutional  position  in  the  oxide  lattice  structure. 
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Figure  7-3.  Fixed  Charge  as  a  Function  of  Quartz 
Tube  Age. 


7. 5.2.5  Polysilicon  Gate  Technology 

From  its  inception,  polysilicon  electrodes  were  used  as  an  integral 
part  of  the  DCCD  processing  technology.  The  use  of  polysilicon  gates  also 
facilitates  the  use  of  self-aligned  gate  structures,  which  cannot  be  implemented 
effectively  with  a  metal  gate  technology.  Polysilicon  films  therefore 
became  a  very  important  feature  of  the  DCCD  structure.  The  chemical  and 
physical  characteristics  of  these  films  had  to  be  carefully  controlled  and 
deposition,  doping,  and  etching  had  to  be  worked  out  in  detail. 

Polysilicon  films  have  been  deposited  by  thermal  decomposition  of 
Silane  in  an  RF-heated,  horizontal  epitaxial  reactor,  using  Hydrogen  as  the 
carrier  gas.  The  resultant  crystalline  structure  is  very  sensitive  to 
deposition  conditions.  The  film's  crystalline  structure  determines  its 
electrical  characteristics,  particularly  its  resistivity.  A  temperature  of 
650°C  is  the  optimum  deposition  temperature  for  polysilicon  films  formed  in 
an  RF-type  epi  reactor.  Films  formed  at  higher  temperatures  usually  had  a 
poorer,  less  uniform,  coarse-appearing  structure.  The  deposition  rate  at 
650°C  was  approximately  lOOOA/minute.  Since  these  films  were  used  as  con¬ 
ductors  and  in  some  instances,  as  diffusion  masks,  no  dopants  were  added 
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to  lower  the  resistivity  of  the  film.  Neither  DCCD  device  or  circuit 
performance  was  adversely  affected  by  the  higher  resistivity  gate  and 
interconnecting  structures. 

Increased  circuit  complexity  forced  reconsideration  of  polysilicon 
film  doping;  despite  the  difficulties  encountered  in  doping  RF-deposited 
polysilicon  films,  techniques  were  developed  to  provide  in-situ  doping  to 
insure  improved  conductivity  of  relatively  long  polysilicon  traces.  BBR^ 
was  employed  to  diffuse  boron  into  the  polysilicon  films;  POCl^  was  used  to 
diffuse  phosphorus  into  polysilicon  films;  both  dopants  lowered  film 
resistivity.  When  it  was  discovered  that  these  dopants  often  penetrated 
the  thin  gate  oxides  beneath  the  doped  polygate  structures,  arsenic  was 
selected  as  the  best  alternative,  to  prevent  this  unwanted  occurrence. 
Diffusion  temperatures  of  950°C  provided  the  most  reliable,  uniform  results. 

7 . 5 . 2 . 5 . 1  The  Single  Level  Polysilicon  Process 

This  process  was  employed  during  the  initial  phases  of  the  DCCD  program. 

It  required  N-type  silicon  wafers,  with  a  resistivity  of  3  to  5  ohm-cm  and 
<  100?  orientation,  to  minimize  surface  charge.  At  first,  a  15.000A  thick 
field  oxide  was  grown  by  means  of  the  dry  oxidation-wet  N^-dry  N^  oxidation 
cycle.  The  CCD  channels  were  defined  by  means  of  positive  photoresist, 
followed  by  a  wet  oxide  etch;  the  gate  oxide  grown  in  the  channel  area  was 
1,000A  thick  and  was  produced  in  a  gate  oxidation  furnace,  operating  at  920°C; 
wet  oxidation  was  achieved  by  means  of  an  oxy-hydrogen  torch.  These  steps 
were  followed  by  deposition  of  a  3500A  polysilicon  film  that  occurred  in  an 
RF-heated  epi  reactor.  Initially,  the  poly  film  was  phosphorus-doped  in  a 
separate  diffusion  furnace  before  in-situ  doping  was  achieved.  The  poly 
film  was  then  covered  by  a  nitride  film  (Si^N^),  approximately  200A  to  400A 
thick.  The  nitride  film  was  slightly  oxidized,  to  improve  photoresist 
adhesion.  The  CCD  gate  patterns  were  then  defined,  followed  by  plasma 
etching  of  both  nitride  and  polysilicon  films  in  a  barrel-type  plasma  etcher. 
Poly  definition  was  followed  by  an  additional  oxidation  step;  this  was  per¬ 
formed  to  increase  the  thickness  of  the  oxide  covering  the  channel  region. 

The  additional  oxidation  acted  as  a  channel  mask  to  insure  that  boron  penetration 
did  not  occur  during  source-drain  diffusions.  This  boron  diffusion  followed 


definition  and  etching  of  field  oxide  over  the  source-drain  regions.  the 
S1 3^4  covering  the  polygate  structure  was  then  removed  by  chemical  etching, 
to  insure  that  the  polysilicon  pattern  was  not  disturbed.  Channel  oxide 
between  polysilicon  gates  was  then  removed,  so  that  a  new,  clean  gate  oxide 
could  be  thermally  grown  for  eventually  deposited  and  defined  metal  gate 
patterns. 

Etching  of  ne  channel  oxide  between  polysilicon  gates  was  followed  by 
the  growth  of  a  2000A  thermally-grown  oxide,  that  covered  both  the  exposed 
channel  areas  as  well  as  the  defined  polysilicon  gate  and  interconnection 
pattern.  Contact  holes  through  this  oxide  layer  were  then  defined  and  etched; 
this  was  followed  by  an  aluminum  film  deposition.  The  aluminum  film  was  then 
defined  into  a  combined  metal  gate  pattern  as  well  as  a  circuit  interconnection 
and  termination  pad  pattern.  Sintering  of  these  wafers  occurred  at  450°C. 

The  sintering  sequence  employed  both  gaseous  N^,  followed  by  sintering  in 
Circuits  fabricated  during  the  early  phases  of  this  program  did  not  receive  a 
SILOX  passivation  step. 

7. 5. 2. 5. 2  The  Double  Polysilicon  Process 

A  two-level  polysilicon  process  was  adopted  and  used  for  DP-2,  DP-3 
and  DP-5  series.  The  processing  sequence  is  quite  similar  to  that  provided 
in  Table  7.1.  Device  cross-sections  are  provided  in  Figure  7-4. 

A  swi tch  was  made  to  N-channel  DCCD  technology,  which  occurred  with  lot 
DP-2A-1N  ("N"  for  N-channel).  The  N-channel  process  is  therefore  significantly 
different  than  the  preceding  P-channel  single  level  polysilicon  process.  The 
N-channel  double  polysilicon  process  required  10  ohm-cm<100>  P-type  starting 
material.  Initially,  an  8,000A  field  oxide  was  grown  by  means  of  wet  oxidation 
at  1075°C.  This  step  was  followed  by  a  channel  stop  cut  and  subsequent 
implant,  that  defined  CCD  channel  perimeters.  Annealment  of  channel  stop 
implant  damage  was  followed  by  CCD  channel  definition;  this  included  removal 
of  field  oxide  over  all  CCD  channel  areas.  A  second  oxidation  and  anneal  of  the 
channel  regions  produced  a  clean  gate  oxide  in  all  channel  areas,  that  was 
approximately  1,000A  thick.  This  was  followed  by  CVD  deposition  of  the  first 
polysilicon  film  (approx.  4,000A  thick),  which  was  patterned  into  Poly  I  gates 
and  interconnecting  structures.  (As  previously  noted,  the  poly  I  film  was 
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doped  by  an  N+  diffusion  at  950°C,  to  insure  good  polygate  conductivity.) 
Definition  of  poly  I  gate  structures  was  accomplished  by  means  of  plasma 
etching  in  a  barrel -type  etcher. 

A  second  channel  etch  at  this  point  in  the  process  prepared  the  channel's 
surface  for  the  growth  of  a  new,  clean,  poly  II  gate  oxide  film  of  1 ,000A 
in  thickness;  this  step  was  followed  by  CVD  deposition  of  the  poly  II  film. 
Thermal  oxidation  of  the  channel  to  provide  the  poly  II  gate  oxide  also  grew 

an  insulating  oxide  layer  over  all  poly  I  structures;  this  is  of  particular 

importance  to  prevent  poly  I  and  II  shorts  from  occurring  in  those  areas 
where  poly  II  gates  overlapped  poly  I  gates. 

The  poly  II  film  was  deposited  3500A  thick;  it  was  also  N+-doped  as  well. 

Following  the  N+  doping  of  the  poly  II  film,  the  wafers  were  processed  through 

a  buffered  "dip"  that  removed  excess  phosphorus  from  the  wafer's  surface. 

NMOS  source-drains  were  then  defined  and  subsequently  doped  in  the  N+  furnace. 
Source-drain  doping  was  followed  by  a  buffered  dip  and  resistivity  check  to 
insure  that  appropriate  doping  level  of  source-drains  was  achieved. 

A  TEOS  film,  5,000A  thick,  was  deposited  over  the  surface  of  the  wafers 

•  0  + 
and  densified  at  920  C.  These  wafers  were  then  inserted  into  the  N  furnace, 

to  provide  a  thin  phosphorus  coating  over  the  wafer's  surface  for  N+  getterinc 

of  heavy  metal  ion  impurities.  Contact  holes  were  then  etched  through  the 

TEOS  and  thermally-grown  oxides  to  provide  vias  to  poly  gates  and  source-drain 

areas.  10,000A  of  copper-doped  aluminum  was  then  deposited  by  means  of 

electron  beam  gun  evaporation  :n  an  ultra-high  vacuum  deposition  system.  The 

metal  interconnection  and  termination  pad  pattern  was  then  defined,  followed 

by  a  sintering  step  of  one  hour  at  450°C. 

The  surface  of  the  wafer  was  protected  by  means  of  a  deposited  passiva¬ 
tion  film  of  SILOX,  approximately  4 ,000A  thick.  The  SILOX  was  then  patterned 
to  expose  circuit  termination  pads,  completing  the  fabrication  process.  The 
process  herein  presented,  has  been  used  for  the  fabrication  of  all  DCCD 
circuits,  designed  within  the  last  18  months. 
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7.5.3  Progress  Towards  an  Isoplanar 


The  complexity  of  DCCD  technology  has  brought  about  a  reduction  in  device 
yields  that  is  not  in  keeping  with  its  overall  potential.  The  density  of 
DCCD  LSI's  will  soon  be  at  a  point  where  it  will  be  impractical  to  decrease 
poly  and  metal  line  widths  and  spaces  without  a  significant  decrease  in  film 
thicknesses  as  well.  Further  increases  in  DCCD  circuit  density  will  be 
brought  about  through  adoptionofa  multilevel  metal  interconnection  technique, 
that  will  permit  three  levels  to  be  used  for  intracell  and  intercell  connec¬ 
tions.  The  key  to  this  technology  milestone  will  be  a  further  reassessment 
of  device  and  circuit  geometry,  directed  towards  an  isoplanar  topography. 

Additional  emphasis  will  be  placed  on  thinner  field  oxides  as  well  as 
thinner  polysilicon  gate  and  supporting  interconnections.  Intermediate  metal 
level  thicknesses  will  be  limited  to  a  few  thousand  Angstroms,  in  lieu  of 
the  10,000A  now  used  for  top  level  metallization.  The  ability  to  design  cell 
structures  that  simplify  circuit  topography  will  become  a  critical  con¬ 
sideration,  in  order  to  achieve  higher  density  and  good  yields.  The  most 
significant  factor  that  will  detract  from  DCCD  device  yields  will  be  step 
coverage  or  level-to-level  transition  of  micron-wide  poly  or  metal  lines. 

Both  CCD  design  and  processing  people  now  believe  that  poly  and  metal  line 
step  coverage  or  line  breakage  is  one  of  the  major  problems  challenging 
high  yields  and  good  performance  of  DCCD  devices.  A  unique  combination  of 
thermally-grown  field  oxides  used  in  combination  with  deposited  and 
densified  S i 0 ^  films,  may  help  solve  this  problem  and  permit  the  realization 
of  this  technology's  true  potential. 
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7.5.4  Metallization  Problems  and  Solutions 

A  metal  step  coverage  problem  was  identified  during  the  early  phases 
of  the  DCCD  program  as  seen  in  the  photograph  of  a  DP-0  circuit. 


Figure  7-6.  Breaks  in  the  A1  Metal  1 ization  -  DP-0  Circuit. 

The  cavity  caused  by  the  undercut  field  oxide  is  shown  in  the  SEM  photograph. 
A  simplified  drawing  which  illustrates  the  cavity  created  by  the  undercut 
oxide  is  shown  in  Figure  7-5;  this  cavity  was  created  during  etching  of  the 
1500A  oxide  covering  the  areas  to  be  occuplied  by  metal  gates  in  addition 
to  the  steep  walls  of  the  polysilicon  film  edge.  Traversing  this  step,  with 
its  built-in  cavity,  is  difficult  for  narrow  line  widths  of  aluminum 
metallization;  LSI  circuits  have  many  thousands  of  these  oxide  steps  which 
must  be  traversed  by  metal  lines.  Unless  an  absolutely  reliable  method  is 
provided  to  insure  conductor  continuity  for  the  many  thousands  of  conductors 
and  interconnections  that  comprise  a  complex  LSI  circuit,  yields  will  be 
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virtually  nil.  These  problems  were  encountered  during  the  early  phases  of 
this  program  on  N1  through  N4,  NAV5  through  NAVI  1 ,  and  on  DP-0  and  DP-1 
circui ts. 

A  concerted  effort  was  made,  at  this  stage  of  processing  technology 
development,  to  solve  the  metal  step  coverage  problem.  Filling  these  so- 
called  cavities  by  means  of  a  deposited  film  of  oxide,  appeared  to  offer  a 
potential  solution.  Both  SILOX  and  TEOS  were  evaluated.  Films  formed  by 
thermal  decomposition  of  Tetraethyl -ortho-si  1 icate  (TEOS),  produced  very  smooth 
transitions  over  polygates  with  undercut  gate  oxides;  this  is  illustrated  by 
the  drawing  in  Figure  7-5.  Whereas  TEOS  was  capable  of  filling  in  the  under¬ 
cut  region  beneath  the  polygate  edge,  SILOX  was  deficient,  coating  the 
surface  and  edges  of  the  polygates  only.  The  decision  to  incorporate 
densified  TEOS  as  a  practical  solution  to  this  problem  resulted  in 
an  immediate  improvement  in  device  yields. 

Incorporating  the  new  TEOS  layer  required  an  extra  mask  level.  The 
resultant  process  involved  depositing  TEOS  over  the  polysilicon  film  before 
the  "poly  protect"  mask  operation.  After  the  "poly  protect"  masking  step, 
the  oxide  in  the  channel  was  etched  away  and  fres.'i  gate  oxide  was  regrown. 
Figure  7-7  shows  a  metallized  poly  step  covered  with  TEOS,  indicating  good 
coverage,  whereas  Figure  7-6  is  an  SEM  of  a  metallized  poly  step  without  TEOS. 
Metal  discontinuities  can  be  observed  in  this  area;  also,  note  the  steepness 
of  the  poly  step,  which  is  responsible  for  the  metal  step  coverage  problem. 

The  problem  of  depositing  and  defining  reliable  metal  gate  structures 
used  on  early  DCCD  circuits,  was  eliminated  by  adopting  a  double  poly  process 
for  the  DP-2,  DP-3  and  DP-5  series  lots.  Excellent  step  coverage  was  obtained. 
In  addition,  the  "poly  protect"  step  was  also  eliminated,  simplifying  the 
DCCD  process  sequence. 

Some  additional  step  coverage  problems  occurred  with  DP-2  circuits  as 
shown  in  Figure  7-5. 
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Figure  7-7.  Source/Drain  Contact  to  Poly  Contact  Structure. 

Coincident  edges  of  polysilicon  traces  over  oxide  cuts  provided  very  steep 
steps  that  significantly  reduced  chances  of  reliable,  unbroken  step  coverage. 
Note  that  the  polysilicon  trace  does  not  overhang  nor  protect  the  edges  of  the 
P+  diffusion;  any  undercut  of  the  gate  oxide  beneath  the  edge  of  the  poly¬ 
silicon  film  permits  the  deposited  aluminum  to  short  the  P  region  to  the  N- 
substrate.  This  design  oversight  was  corrected  on  DP-3  designs. 


Additional  metallization  problems  encountered  during  the  early  phases 
of  this  program  included  the  selection  of  a  satisfactory  metal  sintering 
temperature.  The  metal  sintering  temperature  was  lowered  to  400°C  during 
the  processing  of  DP-2  lots.  Test  results  indicated  that  this  time  vs  tempera¬ 
ture  sequence  was  not  sufficient  to  cancel  X-ray  damage  produced  during  electron 
beam  gun  deposition  of  the  aluminum  metallization  film.  Additional  problems 
encountered  included  poor  ohmic  contacts  to  both  diffusion  areas  and  polygate 
interconnections.  Schottkey  diodes  also  affected  CCD  performance.  In 
subsequent  lots,  a  temperature  of  450°C  was  again  used  to  prevent  the  appearance 
of  unwanted  Schottkey  diodes  and  good  ohmic  contacts  to  diffusion  and  poly 
contact  regions  were  again  established. 


Another  metallization  problem  was  uncovered  when  a  high  percentage  of 
shorts- to-substrate  occurred  when  metal  contacts  were  made  to  poly  structures 
that  lay  over  thin  thermally-grown  oxides  (1,000  -  2.000A  thick).  This 
design  approach  was  carefully  deleted  from  all  future  cell  layouts,  starting 
with  the  DP-3  series. 


8.0  COMPUTER  AIDED  DESIGN 


8.1  INTRODUCTION 

At  the  time  DCCL  was  In  Its  developmental  stages,  existing  computer 
aided  design  (CAD)  systems  serviced  only  transistor  oriented  technologies. 

Thus,  at  the  outset,  these  existing  systems  were  not  useful  for  DCCD  designs. 

The  approach  taken  In  developing  a  CAD  system  for  DCCD  designs  was  to  evolve 
both  the  CAD  system  and  the  DCCD  development  simultaneously.  The  goal  was 
to  find  the  best  trade-offs  between  the  restrictions  of  CAD  systems  and  the 
flexibility  of  layout.  The  result  of  this  effort  is  a  complete  MOS/CCD  design 
and  layout  system  which  provides  front  to  back  LSI  photo  mask  production  In 
periods  of  two  to  three  months. 

8.2  DEVELOPING  A  CCD  DESIGN/LAYOUT  CAD  SYSTEM 

CAD  systems  have  been  completely  structured  around  transistor  devices. 

For  such  CAD  systems  the  devices  and  interconnects  are  represented  symbol ical ly. 
The  symbols  are  designed  in  very  specific  ways  for  each  device  such  that 
when  the  mere  symbols  are  put  together  by  the  system  rules  (often  referred  to 
as  spacing  guides)  the  CAD  software  automatically  assembles  the  actual  chip 
layout.  Such  is  the  case  for  the  oxide  aligned  transistor  (OAT)  technology 
developed  by  TRW. 

The  fundamental  feature  of  CCD  designs,  which  sets  it  apart  from  the  other 
members  of  the  MOS  family,  is  that  there  are  not  necessarily  any  source  or 
drain  diffusions  by  which  basic  devices  can  be  coupled  together  Into  a 
functional  node  structure.  CCD's  are  primarily  made  up  of  overlapping  gate 
electrodes  which  involves  such  considerations  as  getting  the  desired  number 
of  gates  into  a  space  allocation  and  defining  the  CCD  channel  properly  (by 
making  certain  that  the  gate  oxide  and  channel  stop  definition  patterns 
are  continuous  and  properly  positioned  under  the  gate  electrode).  Thus, 
existing  CAD  LSI  layout  systems  had  few  provisions  that  were  directly  applicable 
to  CCD  devices  initially. 
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In  the  experimental  stages  of  CCD  development  the  devices  are  simple 
and  are  evaluated  individually  at  the  wafer  level.  At  this  stage  the  demands 
placed  on  a  CAD  system  are  few.  It  merely  provides  a  quick  and  convenient 
way  of  getting  the  simple  configurations  off  the  drawing  board  and  onto  mask 
plates.  The  test  configurations  are  digitized  into  the  computer  one  or 
two  rectangles  at  a  time  with  the  finest  positioning  grid  available.  The 
grid  units  here  are  Applicon  Graphic  System  (AGS)  units  of  which  there  are 
32,000  by  32,000.  The  scaling  that  was  chosen  for  CCD  was  40  AGS  =  1.0  mil. 
Although  this  approach  offers  almost  total  flexibility  in  layout,  there  are 
some  drawbacks. 

Relative  to  the  more  automated  layout  systems,  the  piece-wise  assembly 
of  the  various  devices  and  configurations  is  immensely  time  consuming  in 
addition  to  incurring  a  high  potential  for  error.  If  there  were  no  other 
approach  to  CCD  design/layout,  it  would  most  certainly  be  very  impracticle 
as  an  LSI  technology.  Clearly,  for  CCD  to  become  a  useful  element  of  the 
MOS  design  family,  some  trade-offs,  had  to  be  made  between  CAD  system  restric¬ 
tions  and  design  and  layout  flexibility. 

8.3  EVOLUTION  OF  DCCD  AS  AN  LSI  TECHNOLOGY 

There  are  four  main  aspects  to  the  approach 

(1)  Develop  a  technique  of  quickly  and  safely  manipulating  very  large 
sections  of  a  design  data  base:  The  most  powerful  feature  available  on  the 
AGS  system  is  the  ability  to  nest  cells.  Nesting  cells  is  a  technique  whereby 
a  cell,  consisting  of  a  collection  of  components  and  basic  devices,  can  be 
nested  in  another  cell  (this  is  nesting  two  levels  deep).  This  means  that 
the  first  cell's  position  in  the  data  base  is  controlled  by  the  positioning 

of  the  latter.  AGS  provides  for  up  to  16  levels  of  nesting.  This,  in  itself, 
presents  the  problem  of  keeping  track  of  the  device  and  component  allocations 
between  the  levels. 

(2)  Develop  a  schematic  representation  of  the  design:  This  includes  a 
schematic  representation  of  CCD  that  is  compatible  with  existing  schematic 
forms  for  MOS.  The  schematic  would  allow  a  number  of  people,  other  than  the 
design  engineer,  to  effectively  check  designs  before  they  go  into  fabrication. 
Schematics  are  also  known  to  be  helpful  in  testing  for  trouble  shooting 
purposes  in  any  technology. 
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(3)  Develop  a  symbology  for  all  the  basic  DCCD  devices  and  Interconnect: 
Developing  a  system  of  symbology  to  be  used  on  the  AGS  for  layout  would  form 
the  heart  of  a  workable  CAD  system.  Rough  layout  diagrams  could  be  generated 
with  symbols  In  a  very  short  time,  and  thus,  chip  dimensions  and  operating 
specifications  could  be  sized  early  in  a  program. 

(4)  Obtain  an  effective  automatic  design  rule  check:  Automatic  design 
rule  checking  is  becoming  an  essential  part  of  the  overall  CAD  system  for 
LSI  designs.  The  goal  In  this  part  Is  to  not  be  limited  to  checking  only 
spacing  rules  of  the  symbols  but  rather  to  check  the  spacing  rules  of  the 
actual  output  patterns. 

8.4  PRESENT  MOS/CCD  DESIGN  AND  LAYOUT  SYSTEM 

The  present  CAD  system  for  MOS/DCCD  takes  full  advantage  of  the  LSI  graphics 
capability  afforded  by  AGS.  A  fully  developed  system  of  schematics  for  the 
MOS/DCCD  is  now  in  use.  A  partial  automatic  design  rule  check  is  being  used 
as  part  of  the  die  design  check  procedure  and  a  complete  design/layout  manual 
has  been  created  which  provides  a  framework  and  source  documentation  for  the 
entire  CAD  system  that  has  been  developed. 

MOS/DCCD  layouts  now  use  the  nested  cell  capability  of  the  AGS.  At  this 
point  in  time,  the  maximum  number  of  nesting  levels  currently  employed  in 
MOS/DCCD  layouts  is  set  at  four.  The  allocations  between  the  nesting  levels 
are  as  follows:  Level  0  contains  fixed  components  such  as  standard  transistors, 
capacitors,  standard  subsections  of  shift  register,  and  funadmental  components 
of  the  DCCD  half  adder  and  charge  fan-out.  Level  1  controls  everything  on 
level  0  plus  the  additional  components  that  complete  the  basic  devices  (i.e. 
half  adders,  charge  fan-out,  MOS  signal  interface,  MOS  timing  and  control  and 
logic  signal  generators).  Level  2  controls  level  1  (and  therefore  level  0) 
plus  all  components  of  interconnect  which  tie  basic  devices  into  larger  functional 
blocks.  These  functional  blocks  can  be  repeated,  moved  with  the  data  base, 
and  moved  to  other  data  bases  with  great  ease.  All  internal  relationships 
are  maintained  automatically  by  the  AGS  software.  Level  3  controls  the 
functional  blocks,  their  interconnect,  and  all  peripheral  interconnect  of  pads, 
reticle  alignment  targets  and  chip  identification  number. 
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A  system  of  schematics  has  been  developed  for  the  MOS/DCCD  technology. 
Figure  8-1  shows  a  listing  of  all  schematic  symbols  and  descriptions.  This 
system  has  been  implemented  on  the  AGS  and  thus  enjoys  all  the  advantages 
of  editing,  cell  replication  and  nested  cell  data  base  management.  A  full 
set  of  formal  schematics  is  created  for  each  design  before  it  enters  the  check 
cycle.  The  schematics  provide  an  effective  means  of  controlling  the  design 
and  layout  throughout  the  entire  process.  The  schematics  have  also  proven 
to  be  invaluable  in  the  trouble  shooting  and  testing  of  new  designs. 
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Figure  8-1.  Index  of  schematic  symbols  and  descriptions. 


The  DCCD  design  process  now  includes  a  form  of  automatic  design  rule 
check.  It  is  run  on  the  COC  time  sharing  system  (TSS)  at  TRW.  This  design 
check  code  was  configured  specifically  for  the  TRW  triple  diffused  (3-D) 
technology.  Thus,  there  Is  a  limited  amount  of  design  check  that  it  can  do 
for  the  MOS/CCD  technology.  The  design  checks  that  are  currently  being  done 
by  this  code  are  for  minimum  contact  hole  coverage  by  metal,  minimum 
rectangle  dimensions  on  all  levels  and  minimum  spacing  between  rectangles 
within  a  given  level.  In  all  these  checks,  the  minimum  dimension  is  specified 
by  the  user  for  each  level  submitted.  The  program  flags  all  rectangles  that 
are  directly  involved  in  any  dimension  that  is  less  than  the  specified  minimum. 
The  remaining  design  checks  for  minimum  spacing  and  overlap  are  done  visually 
between  the  following  levels: 

CHANNEL  OXIDE  -  CHANNEL  STOP 

POLYSILICON  I  -  POLYSILICON  II 

CHANNEL  STOP  -  POLYSILICON  I  &  II 

SOURCE/ORAIN  -/CHANNEL  OXIDE 

I  POLYSILICON  I  &  II 

CONTACT  -  CHANNEL  STOP 

CHANNEL  OXIDE 
POLYSILICON  I  &  II 
METAL 

PASSIVATION  -  METAL 

A  complete  MOS/DCCD  design  and  layout  manual  has  been  created.  It  covers 
every  major  aspect  of  the  CAD  system  that  has  been  developed  for  MOS/DCCD. 
Figure  8-2  shows  a  top  level  flow  diagram,  taken  from  the  manual,  showing  all 
the  major  steps  in  the  present  design  and  layout  procedure. 
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9.0  RELATED  DCCD  PROJECTS 


The  following  sections  provide  a  general  description  of  several  DCCD 
projects  which  were  created  by  the  DCCD  technology  resulting  from  this 
program. 

9.1  FAST  HADAMARD  TRANSFORM  PROJECT 

Data  communications  have  become  an  important  compliment  to  military 
avionics  in  many  command  and  control  applications.  Data  transmission  Is  now 
accomplished  by  digital  communication  systems  that  take  advantage  of  their 
greater  efficiency  and  superior  error  control.  Television,  however,  has 
continued  to  use  analog  signal  processing  techniques.  This  is  a  result 
of  the  high  data  rates  (typically  60  Mbps)  and  concomitant  wide  band- 
widths  required  by  conventional  Pulse  Code  Modulation  (PCM)  of  video.  Typical 
parameters  required  for  good  fidelity  include  sampling  rates  of  10  MHz 
(minimum),  with  6  to  8  bits  per  sample.  Never-the-less ,  the  switching,  storage 
and  processing  advantages  of  digital  data  have  led  many  commercial  and  mili¬ 
tary  applications  to  favor  digital  television  transmission.  Fortunately, 
digital  television  coding  techniques  have  shown  that  significant  compression 
of  high  data  rates  can  be  achieved  through  exploitation  of  spatial,  temporal 
and  spectral  (color)  redundancy  present  in  normal  color  video  data. 

Transform  coding  algorithms  perform  a  one  or  two  dimensional  unitary 
transformation  on  the  input  data  followed  by  some  form  of  quantization  of 
the  resulting  coefficients.  At  the  receiver,  the  inverse  transform  is  per¬ 
formed  to  restrict  the  original  data  within  particular  degradation  limits. 

The  Hadamard,  Fourier,  Maar,  Slant,  and  Cosine  transforms  all  possess  qualities 
desirable  for  data  compression. 

The  chief  disadvantage  of  transform  encoding  are  implementation  complexity 
related  to  large  data  block  sizes  (needed  for  effective  de-correlation)  and 
high  sampling  rates.  With  two  (spatial)  dimensional  data  blocks,  that  can 
take  advantage  of  spatial  picture  correlation,  computation  rates  become 
intense.  For  example,  using  the  straightforward  FFT,  an  8  x  8  sample  data 
block  requires  3  "butterfly"  operations  per  sample.  Besides  the  complex 
multiply  and  add  functions,  special  logic  must  be  included  to  limit  numerical 


noise  effects.  Such  systems  are  practical  only  through  the  use  of  LSI 
technology  to  reduce  subsystem  chip  and  component  complexity.  For  general 
applicability,  a  sufficient  data  range  is  provided  by  a  Hadamard  transform 
chip  that  performs  a  single  stage  of  a  pipeline  version  transform  algorithm. 

The  design  of  a  Fast  Hadamard  Transform  (FHT)  chip  allows  the  same  device  to 
be  used  at  any  stage  of  either  the  forward  or  reverse  transform  for  data 
block  sizes  as  large  as  8  x  8  (Figure  9-1).  In  addition,  all  of  the  multiplex 
clocking  and  control  is  included  on  chip. 

The  interconnection  of  four  typical  FHT  cells  is  shown  in  Figure  9-2. 

Each  cell  consists  of  two  full-adders,  four  multiplexing  AND  gates,  and  a 
feedback  delay.  The  I/O  multiplex  signals  in  a  FHT  A5  (first  Hadamard  stage, 

5th  significant  bit)  or  FHT  A6  (first  Hadamard  stage,  6th  significant  bit) 
require  a  single  clock  phase  delay,  while  in  the  B5  and  B6  stages  (second 
Hadamard  stage,  fifth  and  sixth  bits),  they  require  two  clock  phase  delays. 

These  delays  are  part  of  the  Hadamard  encoding  technique  and  double  at  each 
stage  as  the  data  progresses  across  the  chip. 

The  carry-bit  from  Stage  A5  transfers  vertically  to  Stage  A6  after  a 
one-bit  delay  through  the  full -adder  and  the  input  data  to  Stage  A6  must 
also  be  delayed  by  one-bit  to  ensure  that  they  arrive  synchronously  with  the 
carry-bit.  This  skewing  of  the  data  is  inherent  in  any  DCCL  pipeline  array  to 
accommodate  the  finite  charge  transfer  times. 

All  of  the  arithmetic,  multiplexing,  and  digital  CCD  Logic  functions 
required  for  the  FHT  chip  were  designed  for  inclusion  on  a  test  chip 
designated  FHT-0  (see  Section  4.4  for  functional  results). 

The  8x8  version  of  the  FHT  chip  has  a  completed  initial  layout  at  this 
time.  This  chip  is  required  to  operate  at  2.4  MHz,  dissipate  less  than  750  mw, 
and  measures  just  under  400  mils  on  a  side. 

9.2  AZIMUTH  CORRELATOR  DEVICE 

Arrays  of  ACD  circuits  are  intended  for  use  in  Synthetic  Aperture  Radar 
(SAR)  on-board  data  processors  for  space  missions.  In  the  first  planned  applica 
tion,  an  array  of  1024  ACDs  will  be  used  to  perform  azimuth  correlations  in 
a  Developmental  Model  SAR  Processor  (DMSP).  The  DMSP  functional  block  diagram, 
Figure  9-3,  shows  the  relationship  of  the  azimuth  correlator  to  the  other 
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elements  of  the  DMSP.  The  ACD  receives  data  from  the  range  correlator  via 
the  ACD  input  sample  bus;  this  is  provided  as  a  contiguous  sequence  of  "range- 
lines."  Each  range-line  comprises  2560  complex  data  samples  accompanied  by 
several  sync  signals.  The  basic  operation  performed  by  the  ACD  during  a  range¬ 
line  processing  cycle  is  to  select  the  appropriate  set  of  1024  complex  data 
samples  from  each  range-line,  multiply  each  data  sample  by  the  appropriate  com¬ 
plex  azimuth  reference  function  (ARF)  coefficients  and  add  this  product  to  the 
appropriate  cell  in  the  accumulator  register.  (New  sets  of  reference  function 
coefficients  are  required  for  each  new  set  of  range-line  data  samples  input 
to  the  ACD  during  an  image-line  processing  cycle.)  In  the  DMSP,  an  image¬ 
line  processing  cycle  will  involve  1020  range-lines.  At  the  end  of  an  image- 
line  processing  cycle,  each  ACD  circuit  outputs  its  accumulated  1024  complex 
image-line  samples.  The  image-line  memory  is  filled  with  zero  levels  during 
the  image-line  readout  process  and  a  new  image  line  is  initiated. 

A  functional  block  diagram  of  the  Azimuth  Correlator  Device  is  shown  in 
Figure  9-4.  This  DCCD  chip  contains  both  a  complex  parallel  multiplier  and 
a  large  array  of  CCD  memories. 

The  primary  functional  blocks  for  the  ACD  were  designed,  fabricated,  and 
tested  on  two  evaluation  masks  designated  ACD-0  and  ACD -2.  The  performance 
results  to  date  on  these  circuits  are  described  in  Sections  4.2  (Adder),  4.3 
(Multiplier),  and  4.5  (Memory). 

9.3  SORT  AND  MERGE  (SAM) 

A  key  requirement  of  an  EW  pulse  processor  is  its  ability  to  use  the  infor¬ 
mation  that  characterizes  each  pulse  (i.e..  Time  of  Arrival  (T0A),  Angle  of 
Arrival  (AOA),  Pulse  Width,  and  Pulse  Amplitude)  and  classify  or  sort,  each 
pulse  into  groups  having  similar  parameters.  Thus,  all  pulses  having  an  AOA 
within  specific  boundary  constraints.  Pulses  having  a  similar  Pulse  Repetition 
Interval  (PRI)  that  are  within  an  AOA  boundary  are  capable  of  being  grouped. 
Further  processing  can  then  be  performed  on  selected  groups  having  particular 
parameters,  while  other  groups  can  be  ignored.  This  can  significantly  reduce 
the  required  processing  as  most  signal  environments  contain  far  more  pulses 
that  are  not  of  interest  than  those  that  are  of  interest.  Figure  9-5  shows 
a  functional  block  diagram  of  a  pulse  processing  algorithm  which  can  be 
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applied  to  a  batch  of  characterized  pulses  to  eliminate  those  pulses  that 
are  received  from  angular  locations  (related  to  the  receiver)  that  are  not 
of  interest.  Pulses  that  have  been  processed  and  geo-located  (with  a 
rather  coarse  resolution  of  1/128^  of  a  circle)  are  then  sorted  for  PRI. 

This  final  sorting  groups  pulses  by  potential  emitters  and  thereby  reduces 
the  load  of  any  post  processing  equipment. 

The  possibility  of  using  DCCD  to  perform  the  sort  and  merge  functions 
required  for  EW  processors  emerged  in  late  1978.  A  number  of  techniques  were 
studied  for  compatibility  with  DCCD  Implementation.  Thesr  are  discussed  in 
some  detail  in  Section  6.3. 

These  studies  concluded  that  the  most  favorable  candidate  for  a  DCCD  sort 
and  merge  system  is  a  radix  exchange  algorithm  (see  Section  6.3.2).  In  the 
radix  exchange  algorithm,  an  M  long  list  of  unsorted  N-bit  wide  words  are 
sorted  by  decision  logic  into  two  registers,  depending  on  the  binary  value 
of  the  most  significant  bit.  Since  a  last-in-first-out  (LIFO)  memory  is 
required  to  store  the  sorted  N-bit  wide  words,  the  application  of  a  high  bit 
density  CCD  memory  to  the  sort  and  merge  function  was  apparent.  A  1.2K-bit 
experimental  LIFO  was  designed  and  placed  on  the  ACD-2  test  mask  for  evaluation. 
Unfortunately,  program  cost  and  schedule  constraints  did  not  permit  the  func¬ 
tional  evaluation  of  this  design. 

9.4  THE  TWENTE  REPORT 

An  internal  report  on  digital  charge  coupled  devices  has  been  written 
by  J.W.M.  Jansen  and  publ ished, ^ 1 ^by  Twente  University  of  Technology,  Enchede, 
The  Netherlands.  This  study  was  initiated  as  a  result  of  technical  papers 
that  described  DCCD  logic  cells  as  well  as  the  results  obtained  from  work 
performed  on  this  contract. 

The  TWente  report  contains  a  comprehensive  analysis  of  a  DCCL  "AND"  gate 
and  a  half-adder.  It  also  describes  the  design  and  test  results  of  a  n- 
channel ,  double-polysil icon,  DCCL  evaluation  chip.  This  chip  contained  a 
half-adder,  an  AND/OR-gate,  a  signal  regeneration  (fan-out)  circuit  and  two 
multiple  value  logic  circuits.  The  regeneration  (fan-out)  circuit  differs 
from  TRW  designs  as  it  converts  charge  into  a  voltage;  fan-out  then  creates 
new  charge  packets. 
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The  half-adder  uses  a  floating-capacitor  as  the  charge  sensing  node  of 
the  charge  transfer  electrode  in  an  identical  manner  to  the  DPI  design. 

Since  the  carry  output  is  taken  from  the  master  side  of  the  charge  sense 
electrode  instead  of  the  T-gate,  charge  regeneration  is  not  achieved 
(again  similar  to  the  DPI  design).  A  simple  pseudo-one  phase  clocking  scheme 
was  used  for  these  designs.  This  scheme  requires  a  single  clock  line  plus  a 
d-c  clock  line.  In  order  to  provide  directionality  a  dc  biasing  voltage  is 
required  to  produce  a  surface  potential  step.  Correctly  functioning  AND/OR 
gates  and  half-adders  were  demonstrated  for  clock  speeds  from  10  kHz  to 
500  kHz. 
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10.0  RECOMMENDATIONS  FOR  FUTURE  WORK 


10.1  DESIGN 

The  demonstrated  frequency  and  noise  immunity  performance  of  the  latest 
DCCD  multiplier  (6x4  bit  2 ' s  complement)  and  adder  (10  +  10  bit)  designs 
provide  the  basic  circuit  functions  needed  for  constructing  larger  arithmetic 
and  computational  blocks. 

The  next  step  towards  the  realization  of  these  blocks  should  include 

a  task  for  the  evaluation  and  characterization  of  the  interfaces  between  DCCD 

logic  and  arithmetic  circuits.  Candidate  designs  for  this  interface 

development  include  the  serial  correlator  and  code  generator  applications 

described  in  the  first  (Volume  I:  Digital  Signal  Processing)  study 

phase  of  this  program.  The  serial  correlator  application  is  particularly 

attractive  in  that  it  provides  an  opportunity  to  interface  multipliers, 

adders,  and  memory.  Preliminary  sizing  for  an  8  bit  version  of  this  correlator 

2 

yields  a  comfortable  chip  size  of  approximately  13  mm  . 

Another  task  recommended  for  inclusion  into  future  program  efforts  is 
a  DCCD  design  scaling  investigation.  The  ability  to  perform  dimensional 
scaling  on  DCCD  devices  is  essential  if  they  are  to  remain  competitive  with 
the  current  speed/power  projections  for  future  NM0S  and  CMOS  devices.  One  primary 
activity  anticipated  for  this  task  is  the  need  to  develop  a  ultra  sensitive 
circuit  for  reliably  detecting  the  reduced  charge  levels  resulting  from  this 
dimensional  scaling. 

Finally,  it  is  recommended  that  the  task  of  developing  on-chip  support 
circuits  be  performed.  This  task  will  included  the  design  and  development  of 
high  speed,  low  power  circuits  for  on-chip  clock  generation,  bias  voltage 
generation,  addressing,  decoding,  and  general  input  output  functions  such 
as  tri-state  buffers  and  level  shifters.  Emerging  candidate  technologies 
for  this  application  are  short  channel  CMOS  and/or  dynamic  NM0S. 


10.2  PROCESSING 


Although  the  present  DCCD  processing  sequences  and  techniques  provide 
reliable,  repeatable  circuit  performance  characteristics,  continued  process 
development  is  required  if  functional  density  and  operating  speed  improve¬ 
ments  are  to  continue.  The  following  paragraphs  describe  the  key  items 
identified  as  achieveable  goals  for  the  next  phase  of  OCCD  process  development. 

Additional  effort  is  required  to  maximize  circuit  density  by  reducing 
contact  hole  size  and  poly  gate  lengths,  along  with  supporting  interconnection 
patterns  and  metallization  patterns.  Reduction  of  critical  line  widths 
may  be  achieved  through  anisotropic  etching  (dry/plasma/ractive  ion  etching)  of 
these  fine  patterns  that  control  circuit  density. 

Elimination  of  the  undesirable  interg2te  charge  barrier  or  "bump"  may 
be  feasible,  by  developing  a  reliable  SiC^/SijN^  dielectric  sandwich,  that 
can  be  used  in  lieu  of  a  conventional  gate  oxide.  A  sufficient  amount  of 
work  has  already  been  done  to  improve  chances  of  success  in  this  important 
area.  Such  a  compound  dielectric  may  permit  polygate  doping  concurrently 
with  associated  self-aligned  source-drains .  Concern  over  Boron  penetration 
of  thin  gate  oxides  (500A)  could  be  set  aside,  as  the  density  of  the  nitride 
film  would  prevent  dopant  penetration  of  the  CCD  channel  during  polygate  and 
source-drain  doping  steps. 

Major  improvements  in  polysilicon  film  deposition  can  be  made,  in  order 
to  achieve  significantly  thinner  films  (2000A)  with  similar  or  perhaps  lower 
film  resistivity.  New  techniques  in  CVD-deposited  polysilicon  may  enable 
in-situ  doping,  thus  simplifying  polysilicon  deposition  procedures.  This 
would  include  reduced  operator  handling  of  wafers  with  corresponding  increases 
in  yields.  Further  gains  in  LSI  yields  are  also  potentially  available  if 
significantly  thinner  films  can  be  deposited  that  are  subsequently  doped  by 
a  refractory  metal  that  provides  low  resistivity  values  (0,5  -  5  ohms/sq). 

Thin  poly  films  having  high  conductivity,  also  supports  the  quest  for  a 
basically  isoplanar  technology  and  higher  circuit  yields. 
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Higher  circuit  densities  for  OCCD  chips  may  also  be  obtained  by  employing 
multilayered  interconnections;  this  implies  an  additional  level  of  metal 
interconnections  In  addition  to  the  existing  levels  of  polysilicon  and  metal. 
For  example,  state-of-the-art  Interconnection  system  might  be  comprised  of 
a  polysilicon  level  for  CCD  and  MOS  gates  and  interconnecting  leads.  Denslfled 
TEOS  or  SILOX  would  be  used  to  separate  this  first  level  from  a  second  metal 
Interconnection  level  comprised  of  refractory  metal (s)  such  as  Tungsten  or 
Ti:W.  A  patterned  layer  of  refractory  metal  can  withstand  sintering  tempera¬ 
tures  of  450°C  forming  good  ohmic  contacts  to  the  polysilicon  structures.  In 
addition,  an  Insulating  layer  of  SILOX  can  be  deposited  over  the  high  density 
refractory  metal  lines  without  causing  pattern  dissolution.  The  top  level 
of  metallization  can  be  patterned  of  aluminum  or  copper-doped  aluminum,  which 
in  turn  would  be  protected  by  a  passivation  layer  of  SILOX.  There  are  a 
number  of  critical  parameters  that  must  still  be  worked  out  for  such  a 
complex  multilevel  Interconnection  system  to  be  successful. 


10-3 


11.0  PATENTS  AND  PUBLICATIONS 


11.1  PATENTS 

The  following  section  lists  the  patents  awarded  (and  pending)  that 
result  from  the  digital  charge  coupled  devices  technological  development 
program. 

Patent  No.  4,170,041.  Inventor  T.  A.  Zimmerman  and  R.  Handy 
Logic  Gate  Utilizing  Charge  Teansfer  Devices 

This  is  the  basic  half-adder  and  full -adder  digital  CCD  with  the  addition 

of  a  control  gate  between  the  input  "OR-gate"  (D-gate)  and  the  slave 

side  of  the  charge  transfer  electrode. 

Patent  No.  4,135,104.  Inventor  R.  A.  Allen 
Regenerator  Circuit 

This  is  a  digital  half-adder  in  which  the  CARRY  output  is  changed  from 
the  master  side  of  the  charge  transfer  electrode  to  the  "OR-gate", 
(T-gate).  This  modification  results  in  an  automatic  fully  regenerated 
CARRY  charge  packet.  A  second  modification  causes  a  binary  one  charge 
packet  to  be  inserted  into  one  of  the  inputs  each  clock  cycle,  and  the 
complement  of  the  input  binary  value  to  be  available  at  the  output  plus 
a  fully  regenerated  charge  packet  of  the  same  binary  value  as  the  input 
charge  packet. 

Patent  Pending,  Docket  No.  12-002.  Inventor  R.  A.  Allen 
CCD  Channel  Crossover 

This  Invention  enables  .harge  packets  being  transfered  in  a  typical  CCD 
manner  in  two  separate  shift-registers  to  intersect  without  interference. 

Patent  Pending,  Docket  No.  12-035.  Inventor  R.  A.  Allen 
DCCD  Latch  Circuit 

A  modification  to  a  DCCD  half-adder  in  which  the  exclusive-OR  (SUM) 
output  is  connected  back  to  the  input  to  form  a  latch  circuit.  Set 
and  re-set  inputs  are  available. 


11-1 


Patent  Pending,  Docket  No.  12-036.  Inventor  R.  A.  Allen 
DCCD  Frequency  Divider 

A  modification  to  a  DCCD  half-adder  In  which  the  AND  (CARRY)  output  is 
connected  back  through  a  shift-register  to  one  of  the  inputs.  A  binary 
one  charge  packet  is  inserted  into  the  other  input  at  each  clock  phase. 

The  frequency  is  f  =  fc(n+l )J2  where  fc  is  the  clock  frequency  and  n  is 
the  number  of  shift-register  stages  in  the  feedback  path. 

Patent  Pending,  Docket  No.  12-037.  Inventor  R.  A.  Allen 
DCCD  Fan-Out  Generator 

A  digital  CCD  that  uses  the  input  charge  packet  to  control  the  state  of  a 
charge  transfer  electrode.  This  electrode  then  controls  through  which 
output  ports  two  or  more  full -charge  packets  are  transfered.  It  is 
possible  to  obtain  either  two  or  three  charge  packets  of  the  same  or 
the  binary  complement  of  the  input  charge  packet. 

11.2  PUBLICATIONS 

All  of  the  following  technical  publications  discuss  digital  ch  -ge  coupled 
devices,  either  as  tutorial  papers  that  described  the  technology  or  s 
descriptions  of  system  applicatons. 


1.  T.  A.  Zimmerman,  "Charge  Coupled  Devices  in  Signal  Processing  Systems:  The 
Analog  and  The  Digital  Approach",  1974  National  Telecommunications  Confer¬ 
ence,  San  Diego,  December  1974 


2.  T.  A.  Zimmerman,  "The  Digital  Approach  to  Charge  Coupled  Device  Signal  Pro¬ 
cessing",  IEEE  Advanced  Solid  State  Components  for  Signal  Processing,  1975 
Circuits  and  S'ys terns  Conference,  pp  69-82.  April  1975 

3.  C.  S.  Miller,  T.  A.  Zimmerman,  "The  Application  of  Charge  Coupled  Devices  to 
Digital  Signal  Processing",  International  Communications  Conference,  San 
Francisco,  pp  2-20/2-24  June  1975 

4.  C.  S.  Miller,  T.  A.  Zimmerman,  "Applying  the  Concept  of  a  Digital  Charge 
Coupled  Device  Arithmetic  Unit",  Second  International  Conference  cm  the 
Application  of  Charge  Coupled  Devices,  pp  199-20^.  San  Diego.'  Oct.  1975 

5.  T.  A.  Zimmerman,  R.  A.  Allen,  " Charge  Coupled  Device  Digital  Arithmetic 
Functions:  Experimental  Results",  Third  International  Conference  on  the 
Application  of  Charge  Coupled  Devices,'  pp  19'0-'19’6.  September  1 976 

6.  R.  A.  Allen,  R.  J.  Handy,  J.  E.  Sandor,  "Charge  Coupled  Devices  in  Digital 
LSI",  International  Electron  Devices  Meeting, “pp  21-26,  December  1976 
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7.  R.  A.  Allen,  "Digital  CCD  Arithmetic  Technology,"  Digest  of  Papers,  Compcon 
77,  pp.  342-344,  March  1977. 

8.  T.  A.  Zimmerman,  D.  F.  Barbe,  "A  New  Role  for  Charge-Coupled  Devices: 
Digital  Signal  Processing",  Electronics ,  pp.  97-103,  March  1977. 

9.  C.  S.  Miller,  C.  F.  Motley,  R.  A.  Allen,  "Digital  Charge  Coupled  Device 
Technology  and  Digital  Filters  Applications,"  EASTC0N-77 ,  Proceedings, 
pp.  30-1  A/30-1 H ,  September  1977. 

10.  T.  A.  Zimmerman,  R.  A.  Allen,  R.  W.  Jacobs,  "Digital  Charge  Coupled  Logic 
(DCCL)",  IEEE  Journal  of  Solid-State  Circuits,  pp.  473-485,  October  1977. 

11.  R.  A.  Allen,  D.  J.  Spencer,  C.  S.  Miller,  "LSI  Video  Compression  and 
Computational  Module  Utilizing  Digital  Charge  Coupled  Devices,"  A6ARD 
Conference,  Proceedings,  pp.  3. 8-1/3. 8-5  Quebec,  October  1977. 

12.  E.  Hyman,  R.  A.  Allen,  "The  Development  and  Application  of  a  Digital 
Charge  Coupled  Logic  (DCCL)  Arithmetic  Unit,"  1978  International  Conference 
on  the  Application  of  Charge  Coupled  Devices,  Proceedings  pp  3A-53/3A-57, 
San  Diego,  October  1978. 

13.  T.  A.  Zimmerman,  "Charge  Coupled  Devices,"  Quest,  New  Technology  of  TRW 
Defense  and  Space  Systems  Group.  Autumn  1978,  Vol.  2,  No.  2,  pp.  70-92. 

14.  D.  J.  Spencer  and  J.  M.  Anderson,  "A  Real  Time  Video  Bandwidth  Reduction 
System  based  on  a  CCD  Hadamard  Transform  Device,"  Proceedings  of  the  IEEE 
1974  NAECON,  Vol.  3,  pp.  1218-1231,  May  15-17,  1979. 
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